Course 1: Neural Networks and Deep Learning
Hello Everyone, I have just begun this course which consists of 4 Weeks and a dedicated time of 20 hrs approx.
I plan to complete all the 4 weeks within Sep 7, 2020. I will update my blog with everything that I learn in this course. So Let's Begin :-
Week 1: Introduction
Lec 1: What is a Neural Network?
Here we have a Housing Price Prediction Data, with 6 datas plotted on a Price vs Size of House graph;
we might feel to draw a straight line as in Linear Regression, but that would eventually be -ve, and price can't be -ve, so we just make it zero once it touches the abscissa.
So, Size ---> O --> Price
(x) (y)
Here, O is the Neuron.
And

this is the ReLU function, Rectified Linear Unit, which is the neuron.
A bigger Network is created by joining all these smaller neurons.
Eg,
we have more features as #bedrooms, family size, zip code, wealth.
where X is all the four Inputs and y is the price.
Each of the circles are called hidden units. So the input layer and the middle layer is densely connected.
Given enough training data, neural networks are remarkably great functions. They are very helpful in Supervised learning.
Lec 2 : Supervised Learning with Neural Networks
One of the most lucrative application is online advertisement.
Neural networks have become very good in this.
Computer vision, Speech Recognition, Machine Translation are the applications.
CNN - Convolutional Neural Network, RNN - Recurrent Neural Network.
Note: RNN : -
A recurrent neural network is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence.
- It can be trained as a supervised learning problem.
- It is applicable when the input/output is a sequence (e.g., a sequence of words).
Structured data = Structured Data, each fewatures has a well defined meaning
Unstructured data = Refers to things like, raw audio, text, image pixels, etc.
So computers are now able to understand unstrutured data. :)
Lec 3: Why is Deep Learning taking off?
Performance flattens after a certain amount of time....
Over the last 20 years, we have been collecting more data.
For such great performance we need two things...
- To be able to train a big NN inorder to take advantage of thehuge amt of data
- Be on the x axis, i.e. have a more data
Amount of Labelled data (m) - training eg
X,y as input
Relative order is not so well defined in the early part, so we need skill to implement on smaller data, only in the larger data we need to use Deep Learning.
In the early days, it was scaled data... Recently, switching from sigmoid to ReLU.
Because in Sigmoid func, the regions where the slope is zero, learning becomes very slow. As we implement Gradient descent, the learning becomes very slow.
Whereas by changing to RELU the gradient is = 1, so it much less likely to go to 0... it made Gradient Descent Fast.
The other reason is implementing a NN is iterative, it takes a long time. So faster computation has helped.
Lec 4: About this Course
From Discussion Forums:
----------------------------------
ReLU is not a linear function. It is a piecewise linear function, which, if we are being mathematically correct, is not the same thing. Activation functions must be non-linear or you lose the whole point of multiple layers in a network.
The maximum value for the derivative of sigmoid occurs at z = 0. At that point, the value of sigmoid is 0.5. Applying the formula that we have derived for the derivative of sigmoid, we can calculate:
g′(z)=g(z)(1−g(z))
g′(0)=21(1−21)
g′(0)=41
Prof Ng was referring to the portions of domain of sigmoid that are far from z=0. Those are the "tails" of the function and you'll notice that they are asymptotic to horizontal lines as you move towards −∞ or +∞.So as long as the linear activation at the output layer stays relatively close to 0, then the gradients are useful and learning can take place at a reasonable rate. It's only when the values become large in absolute value that you have the problem of getting stuck and it taking a lot of iterations to make much progress.
In Deep Learning, weight adjustment is performed in order to reduce the difference between network output and actual output. If the activation function is nonlinear (sigmoid for example), it takes so long time for the optimization of weight to be done especially at points when the function does not change significantly despite a relatively large change in the input to the activation function. On the other hand, when a linear function is used (e.g. Relu), the activation function changes uniformly as the input to the linear function is changed and there is no point when a change in the input to this linear activation function leads to a small change in the output. Note that the above analysis is under the assumption that the learning rate is the same for both activation functions (linear and non-linear).
Follow this link for a Practical View : https://nbviewer.jupyter.org/github/hermesribeiro/deeplearning-dot-ai/blob/master/Logistic%20vs%20ReLU%20alg%20performance.ipynb
From MCQs, the new point I have learnt,
- Increasing the training set size generally does not hurt an algorithm’s performance, and it may help significantly.
- Increasing the size of a neural network generally does not hurt an algorithm’s performance, and it may help significantly.
------------------------------------------------------------
Thank You :)
Comments
Post a Comment