8.1 Clustering: K-means algorithm

2013-06-15 | 分类 course | 标签 ml

Notation

$x^{(i)} \in \mathbb{R}^n$ (drop $x_0 = 1$ convention)

Randomly initialize K cluster centroids $\mu_1, \mu_2, \cdots, \mu_k \in \mathbb{R}^n$

Repeat {

/* Cluster assignment step */ 可以参考 pdist2

for $i = 1$ to $m$

$c^{(i)}$ := index (from i to K) of cluster centroid closest to $x^{(i)}$

$\boxed{min_k = \Vert x^{(i)} - \mu_k \Vert^2}$

/* Move centroid */ 可参考grpstats

for $k = 1$ to K

$\mu_k$ := average(mean) of points assigned to cluster k

}

$c^{(i)}$ = index of cluster(1,2,…,K) to which example $x^{(i)}$ is currently assigned
$\mu_k$ = cluster centroid k ( $\mu_k \in \mathbb{R}^n$ )
$\mu_c^{(i)}$ = cluster centroid of cluster to which

$J(c^{(1)},\cdots,c^{(m)},\mu_1,\cdots,\mu_k) = \frac{1}{m} \sum_{i=1}^{m} \Vert x^{(i)} - \mu_k \Vert^2$

Randomly initialize K cluster centroids $\mu_1, \mu_2, \cdots, \mu_k \in \mathbb{R}^n$

For i = 1 to 100 {

Randomly initialize K-means
Run K-means. Get $c^{(1)}, \cdots, c^{(m)},\mu_1, \cdots, \mu_k$
Compute cost function( distortion ) $J(c^{(1)},\cdots,c^{(m)},\mu_1,\cdots,\mu_k)$

}

Pick clustering that gave lowest cost $J(c^{(1)},\cdots,c^{(m)},\mu_1,\cdots,\mu_k)$

Elbow method

slide