2 Linear Regression with Multiple Variables

| 分类 course  | 标签 ml 

Multivariate linear regression

Notation:

  • m = number of training examples
  • n = number of features
  • = input(features) of training example.
  • = value of feature in training example.

Hypothesis:

For convenience of notation, define .

Cost funciton:

Gradient descent:

Repeat { } (simultaneously update for every )

New algorithm :

Repeat {

} (simultaneously update for )

Gradient Descent in Practice 1 – Feature Scaling

Feature Scaling

Get every feature into approximately a range.

Mean normalization

Replace with to make features have aproximately zero mean(Do not apply to )

, is the average value of in training set(all the value of feature ), is the range (max-min) (or standard deviation)(max and min is the two values in feature )

Gradient Descent in Practice 2 - Learning Rate

Gradient descent

  1. “Debugging”: how to make sure gradient descent is working correctly
  2. How to choose learning rate .

Making sure gradient descent is working correctly

  • For sufficiently small should decrease on every iteration
  • Buf if is too small, gradient descent can be slow to converge.

Summary

  • if is too small: slow convergence.
  • if is too large: may not decrease on every iteration;may not converge.

To choose , try

Features and Polynomial(多项式的) Regression

Polynomial Regression

Normal Equation

Normal Equation: Method to solve for analytically.

summary

m training examples, n features.

Gradient Descent:

  • Need to choose
  • Needs many iterations.
  • Works weel even when n is large.

Normal Equation:

  • No need to choose
  • Don’t need to iterate.
  • Need to compute
  • Slow if n is very large

when , we should use gradient descent, and when n is smaller than that, we can use normal equation.

Normal Equation Noninvertibility

What if is non-invertible?

  • Redundant features(linearly dependent)
  • Too many features(e.g. ) - Delete some features, or use regularization.

上一篇     下一篇