8.1 Clustering: K-means algorithm

| 分类 course  | 标签 ml 

Input:

  • K (number of clusters)
  • Training set {}

Notation

(drop convention)

K-means algorithm

Randomly initialize K cluster centroids

Repeat {

/* Cluster assignment step */ 可以参考 pdist2

for to

:= index (from i to K) of cluster centroid closest to

/* Move centroid */ 可参考grpstats

for to K

:= average(mean) of points assigned to cluster k

}

K-means optimizaiton objective

  • = index of cluster(1,2,…,K) to which example is currently assigned
  • = cluster centroid k ()
  • = cluster centroid of cluster to which

Random initialization

Randomly initialize K cluster centroids

  • Should have
  • Randomly pick K training examples
  • Set equal to these K examples

For i = 1 to 100 {

  • Randomly initialize K-means
  • Run K-means. Get
  • Compute cost function( distortion )

}

Pick clustering that gave lowest cost

Choosing the value of K

Elbow method

slide


上一篇     下一篇