Today’s topic is Activation functions of ANN.
Let me list out the most common activation functions.
- Sigmoid function .
- ReLU ( Rectified linear unit).
- Leaky ReLU.
Let’s start the discussion with a simple example:
If we try to predict house price then which function can we use?
If we use sigmoid function then the increment of decrement of z (wx+b), the function will return value between 0 and 1. We know that sigmoid function is used for only binary classification not for value prediction or regression problem.
Now, in this case, we should use a different function. The best option is ReLU.
The ReLU function is max(0,Z). That means the function will return value which is always greater than 1 for increment of Z. But, if we decrease z? If z =0? Then the output is showing 0. Can a price be 0? Not always, right?
So we just need to slightly modify the ReLU function named it as leaky ReLU.
Now, the prediction of house depends on multiple parameters. So it is always better to give some value (positive or negative) for each parameter even if z goes negetive. So automatically, the less important variables’ parameter will decrease over multiple iteration on model learning. We can use another function named tanh but this is not a popular function like ReLU.
Sigmoid function always gets stagnant by time for more training on the model (increment or decrement of z). If you notice the sigmoid function, the function is S curved. After a certain training, the model will not learn because of very very slow update on parameter of z ( w and b).
But we never face this kind of issue in ReLU function.
You can see the formula for each activation function for better understanding. But did you notice that I have mentioned about learning the model by updating the parameters?
In my next post, I shall discuss the flow of neural network to update the parameters, w and b in various layers. This is called forward and backward propagation.