Multivariate linear regression
Notation:
- m = number of training examples
- n = number of features
- = input(features) of training example.
- = value of feature in training example.
Hypothesis:
For convenience of notation, define .
Cost funciton:
Gradient descent:
Repeat { } (simultaneously update for every )
New algorithm :
Repeat {
} (simultaneously update for )
Gradient Descent in Practice 1 – Feature Scaling
Feature Scaling
Get every feature into approximately a range.
Mean normalization
Replace with to make features have aproximately zero mean(Do not apply to )
, is the average value of in training set(all the value of feature ), is the range (max-min) (or standard deviation)(max and min is the two values in feature )
Gradient Descent in Practice 2 - Learning Rate
Gradient descent
- “Debugging”: how to make sure gradient descent is working correctly
- How to choose learning rate .
Making sure gradient descent is working correctly
- For sufficiently small should decrease on every iteration
- Buf if is too small, gradient descent can be slow to converge.
Summary
- if is too small: slow convergence.
- if is too large: may not decrease on every iteration;may not converge.
To choose , try
Features and Polynomial(多项式的) Regression
Polynomial Regression
Normal Equation
Normal Equation: Method to solve for analytically.
summary
m training examples, n features.
Gradient Descent:
- Need to choose
- Needs many iterations.
- Works weel even when n is large.
Normal Equation:
- No need to choose
- Don’t need to iterate.
- Need to compute
- Slow if n is very large
when , we should use gradient descent, and when n is smaller than that, we can use normal equation.
Normal Equation Noninvertibility
What if is non-invertible?
- Redundant features(linearly dependent)
- Too many features(e.g. ) - Delete some features, or use regularization.