Hello! I hope you are well and exploring lots of topics in Machine Learning (ML).
To continue with my series of ML posts, today, I feel that I need to share a common unsupervised algorithm for a classification problem in ML. My favorite is K means clustering.
K-Means: Let’s explore the meaning of this word.
“K” stands for a variable which holds the value of categories like how many categories our problem should have like 2, 3 or 4. “Means” stands for average which is the central tendency measure of the values of my data. Please note that there would not be any label in our received data. That is why it is called an unsupervised problem.
Now, let me describe the concept:
Suppose you want to categorize the data-set into 2 clusters:
1. Choose any two data points in the graph. They are called “Centroids”.
2. Take the distance (Euclidian distance or Manhattan distance) from each data point to those chosen points.
3. Make two groups with some boundary with the nearest points; with those 2 Centroids.
4. Take the mean of the chosen groups and move the Centroids to that new point in each group.
5. Repeat steps 2, 3 and 4.
6. After some iterations, you shall see that the Centroids are not moving any further.
So, finally, you get the actual groups.
Application: K-Means clustering is popularly applied on recommendation systems such as movie suggestions in Netflix, product suggestions in Amazon, customer segment analysis in retail businesses, etc.