Difference between K means and K medoids Clustering

Last Updated : 18 Jun, 2024

Clustering is the most basic form of data grouping in data analysis as well as in machine learning; the process involves putting the given set of objects into various groups. In which the objects in any single group will be much closer to one another than the objects belonging to other groups. Facilitating the process of clustering, there are several algorithms that are quite popular: K-Means and K-Medoids. However, these two terms are not the same, though they have some related aspects in their application and usage.

The main purpose of this article is to outline the major points of difference between K-Means and K-Medoids clustering algorithms.

Table of Content

What is K-Means Clustering?

What is K-Medoids Clustering?

Key Differences Between K-Means and K-Medoids

Conclusion

What is K-Means Clustering?

K-Means is an iterative algorithm that assigns K clusters to a dataset where each cluster has a center that is the average of all the points situated in it, always referred to as the centroid. The steps involved in K-Means are:

Initialization: Choose some points in K as initial centroids randomly.
Assignment: Label every data point to the closest centroid of that cluster using the formula of the distance that squares.
Update: The centroids need to be calculated again, and it is done by taking the average of all the points that belong to a particular cluster.
Repeat: Make assignments multiple times and perform update actions until they do not alter anymore (this step is called convergence).

Advantages of K-Means

Simplicity: The advantage of K-Means is that it is simple to use and has a rather uncomplicated algorism.
Efficiency: It is effective in terms of time complexity and thus can easily work with large data sets.
Speed: generally converges quickly.

Disadvantages of K-Means

Sensitivity to Outliers: K-Means is also susceptible to noise and outliers, chiefly because of its reliance on means as the critical measure in binning assignments.
Shape Assumption: This tends to hold the distances by treating the clusters as spherical objects of equal sizes, which is not always true.
Initial Centroids: The selection of the first K centroids may differ, and this may lead to different clustering and hence inaccurate clustering.

What is K-Medoids Clustering?

K-Medoids, or Partitioning Around Medoids (PAM), is similar to the K-Means clustering method but requires the use of medians for the formation of subgroups. A medoid is a centroid that best represents the objects in a defined cluster. The steps in K-Medoids are:The steps in K-Medoids are:

Initialization: The first step in the selection process involves choosing K initial medoids at random.
Assignment: Within the DUHI scenario, assign each data point to the nearest medoid or a previously defined criterion distance.
Update: In order to do this, the choice of a new medoid to represent each cluster should be the one that would result in the smallest sum of the distances within the cluster.
Repeat: Cycle through the assignment and update steps, revolving until meeting the convergence.

Advantages of K-Medoids

Robustness to Outliers: Most often, medoids are less prone to outliers and noises than centroids.
Flexibility in Distance Metrics: Others can use any distance measurement, not limited to the Euclidean distance measurement, making it general for all data types.

Disadvantages of K-Medoids

Computationally Intensive: K-Medoids is relatively slower than K-Means due to its higher complexity when working with larger databases.
Complexity: It is also more difficult to implement and understand than K-Means, even though it is a more effective clustering method.

Key Differences Between K-Means and K-Medoids

Centroid vs. Medoid

K-Means: Uses the mean of the points in a cluster as the centroid, which may not be an actual data point.
K-Medoids: Uses actual data points as medoids, making it more interpretable.

Distance Measures

K-Means: Typically uses Euclidean distance, which may not be suitable for all data types.
K-Medoids: Can use any distance measure, providing more flexibility.

Sensitivity to Outliers

K-Means: Sensitive to outliers, as they can significantly affect the mean.
K-Medoids: More robust to outliers, as medoids are actual data points less influenced by extreme values.

Computational Complexity

K-Means: Generally faster and more efficient, making it suitable for very large datasets.
K-Medoids: Slower due to the need to evaluate all possible swaps, better for smaller datasets or when robustness is crucial.

Convergence

K-Means: Converges faster but may end up in local minima.
K-Medoids: More likely to find a global optimum but at a higher computational cost.