Today I want to talk about the most common issue in deep and machine learning problem and that is overfitting and underfitting issue.
In case of underfitting, it was a problem in machine learning centuries back, when we have less predictors to predict something. But in the era of Big data, we can avoid it before the model building. How? We have domain experts who can suggest us lots of possible predictors for a business problem in retail or insurance. We have lots of images from web or mobile device for image classification or detection problem.
Now you will tell me how to choose the best predictors? Good question. But in the era of Neural Network, it is not a problem at all! The model itself decides the best predictors after several iteration. The most important predictors’ parameters will have high value whereas less predictors have low value in parameter (weight and bias). But what if the model is giving very accurate result (98-99%) but drastically failed in testing?
The reason behind is over-fitting.
Let me show the intuition in a picture below:
The middle one is just right model. But how to handle the over fitting? Should we omit some predictors which has less value in parameters? Well, this is not right solution because you do not know if I add more examples in future then these predictors can be proved as stronger. Who knows?
Here is the solution came in the name of Regularization.
I have told before about the cost function in my previous post. Now, I am adding one more formula in the above picture colored in red circle which is called L2 regularization. Where lamda is hyperparmeter.
Theta is the parameter W and B and it is actually calculated as Wtranpose W format. Well, how does it actually help?
During the derivation , we also need to add this extra W while updating the parameter W.
W = W – alpha* dW
Where dW consists of some value of W which is minimizing the final value of W.
By doing so, we are mainly controlling the parameters value to be more biased with the particular set of training example for the model.
We also have L1 regularization which is calculated by Mod of W or B instead of squeres.
I hope the post is helpful for you to understand the issue of model biasness in Machine and Deep learning.