8.2 Dimensionality Reduction, PCA

| 分类 course  | 标签 ml 

Data Compression

从2D到1D,3D到2D

 Principal Component Analysis(PCA)

Reduce from n-dimension to k-dimension:

Find k vectors onto which to project the data, so as to minimize the projection error.

PCA is not linear regression

Data preprocessing

Traing set:

Preprocessing(feature scaling/mean normalization):

Replace each with

If different features on different scales, scale features to have comparable range of values.

Reduce data from n-dimensions to k-dimensions

  • Compute”covriance matrix”: X(m*n)中的每一行是一个example
  • Compute “eigenvectors” of matrix :
  • U是n*n的矩阵
  • S是n*n矩阵
  • 是n*k矩阵
  • Z 是 k*n

Reconstruction from compressed representation

Choosing the number of principal components

Choose K to be smallest value so that

99% of variance is retained!!

More detail slides#24

Advice for applying PCA

  • Supervised learning speedup. . New training sets have low dimensions.

NOTE

Mapping should be defined by running PCA only on the training set. This mapping can be applied as well to the examples and in the cross validation and test sets.

  • Reduce memory/disk needed to store data
  • Visualization

Bad use of PCA

  • Use instead of to reduce the number of features to k<n. Thus, fewer features, less likely to overfit. Use regularization instead.

PCA正解使用

在使用PCA前,先用原始数据运行,如果结果不是想要的,然后再用PCA.

slide


上一篇     下一篇