[TIL] 13-Feb-2020
K-Means, Dimensionality Reduction
K-Means is an unsupervised learning clustering algorithm. First we decide how many clusters we want (K) we randomly initialize a point to represent each cluster called the centroid. The algorithm has basically 2 steps: 1. For each point in the dataset assign the cluster whose centroid is closest to it. 2. Update the positions of the centroids to be the mean of all points in the corresponding cluster. The cost function is simply defined as the mean distance of points to their centroid which is written as \(c_i\). The cluster assignment step is equivalent to minimizing the cost function with respect to \(c_i\) and the centroid location update step is equivalent to minimizing the cost function with respect to \(\mu_i\) which is the location of the centroid.
Dimensionality Reduction is a useful thing for data compression and also allows our machine learning algorithms to run faster and also aids in visualizations. This is done by projecting a higher dimension dataset to some lower dimension by taking advantage of correlations present… The exact method is PCA which we havent studied yet.