Friday, January 27, 2017

K-Means Algorithom

Input: K, set of points x1...xn
Place centroids c1,,,,ck at random locations
Repeat until convergence:
for each point xi:
arg min j D(xi, cj)

for each cluster, computer new centroids
cj(a) = 1/csum(xi)

Stop when none of the cluster assignments change

K-Means: very fast
complexity(#iterations * #clusters * #instances * # dimensions)
Euclidean distance
O

Hierarchical Clustering and k-means clustering complement each other. In hierarchical clustering, the researcher is not aware of the number of clusters to be made whereas in k-means clustering, the number of clusters to be made are specified before-hand.

Advice- If unaware about the number of clusters to be formed, use hierarchical clustering to determine the number and then use k-means clustering to make more stable clusters as hierarchical clustering is a single-pass exercise whereas k-means is an iterative process.


Explain what a local optimum is and why it is important in a specific context, such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?

  • A solution that is optimal in within a neighboring set of candidate solutions
  • In contrast with global optimum: the optimal solution among all others
  • K-means clustering context:
    It’s proven that the objective cost function will always decrease until a local optimum is reached.
    Results will depend on the initial random cluster assignment
  • Determining if you have a local optimum problem:
    Tendency of premature convergence
    Different initialization induces different optima
  • Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

No comments:

Post a Comment