Input:
- K (number of clusters)
- Training set {}
Notation
(drop convention)
K-means algorithm
Randomly initialize K cluster centroids
Repeat {
/* Cluster assignment step */ 可以参考 pdist2
for to
:= index (from i to K) of cluster centroid closest to
/* Move centroid */ 可参考grpstats
for to K
:= average(mean) of points assigned to cluster k
}
K-means optimizaiton objective
- = index of cluster(1,2,…,K) to which example is currently assigned
- = cluster centroid k ()
- = cluster centroid of cluster to which
Random initialization
Randomly initialize K cluster centroids
- Should have
- Randomly pick K training examples
- Set equal to these K examples
For i = 1 to 100 {
- Randomly initialize K-means
- Run K-means. Get
- Compute cost function( distortion )
}
Pick clustering that gave lowest cost
Choosing the value of K
Elbow method