Hi Friends,
Let’s get back to technical things again. In my last topic, I have explained the concept of linear regression.
Today, I am going to explain what is logistic regression. Let’s start with an example: suppose, you want to predict a diagnostic test for cancer.
Now, you have some features of data like age, family history, smoking habit, blood pressure, blood parameter details and so on. At the end, you want to come to a decision that the patient has cancer or not.
Don’t you think it’s like probability?
Like, he/she has 30% chance or 70% chance.
So what is the final result you would prefer?
Yes or No.
Based on what? Any threshold value? Mostly it’s 50% above or below. Sounds good?
Ok, before jumping into the concept I want to focus on why we are not using linear regression?
Suppose, you have a feature in X-axis and you want to decide the chance of cancer in Y-axis.
Now, if the line is surrounded by values then we can say – ok, it would be this or it would be that.
But in this case, your values are only two: 1 or 0.
Check the above graph. Do not notice the curve right now. Keep your eyes on the red dots.
This red dots are the values; either zero or one.
Now think – how do you apply a straight line in this case?
Here we go..
Let me introduce a new formula that is sigmoid curve or s-curve.
Let me interpret the formula from linear regression that is Y=mX+C
Our requirement is to present a value of Y between 0 and 1.
How do we do that?
Ok, let’s focus on how to convert Y as greater than 0.
e^(mX+C)
Any exponential component is always positive.
Now, let’s focus how to convert the value of Y as less than 1.
e^(mX+C) / (1+ e^(mX+C))
This will always result in a value between 0 and 1.
Looks good?
The final formula is:
Y = e^(mX+C) / (1+ e^(mX+C))
I hope you liked my post. Stay tuned for more.
Nice explanation.