Input: K, set of points x1...xn
Place centroids c1,,,,ck at random locations
Repeat until convergence:
for each point xi:
arg min j D(xi, cj)
for each cluster, computer new centroids
cj(a) = 1/csum(xi)
Stop when none of the cluster assignments change
K-Means: very fast
complexity(#iterations * #clusters * #instances * # dimensions)
Euclidean distance
O
Hierarchical Clustering and k-means clustering complement each other. In hierarchical clustering, the researcher is not aware of the number of clusters to be made whereas in k-means clustering, the number of clusters to be made are specified before-hand.
Advice- If unaware about the number of clusters to be formed, use hierarchical clustering to determine the number and then use k-means clustering to make more stable clusters as hierarchical clustering is a single-pass exercise whereas k-means is an iterative process.
Place centroids c1,,,,ck at random locations
Repeat until convergence:
for each point xi:
arg min j D(xi, cj)
for each cluster, computer new centroids
cj(a) = 1/csum(xi)
Stop when none of the cluster assignments change
K-Means: very fast
complexity(#iterations * #clusters * #instances * # dimensions)
Euclidean distance
O
Hierarchical Clustering and k-means clustering complement each other. In hierarchical clustering, the researcher is not aware of the number of clusters to be made whereas in k-means clustering, the number of clusters to be made are specified before-hand.
Advice- If unaware about the number of clusters to be formed, use hierarchical clustering to determine the number and then use k-means clustering to make more stable clusters as hierarchical clustering is a single-pass exercise whereas k-means is an iterative process.
Explain what a local optimum is and why it is important in a specific context, such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?
- A solution that is optimal in within a neighboring set of candidate solutions
- In contrast with global optimum: the optimal solution among all others
- K-means clustering context:
It’s proven that the objective cost function will always decrease until a local optimum is reached.
Results will depend on the initial random cluster assignment - Determining if you have a local optimum problem:
Tendency of premature convergence
Different initialization induces different optima - Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost
No comments:
Post a Comment