Quize
Suppose an implementation of linear regression (without regularization) is badly overfitting the training set. In this case, we would expect: The training error to be low, the test error to be high.
Deciding what to try next
- Get more training examples
- Try smaller sets of features
- Try getting additional features
- Try adding polynomial features ( etc)
- Try decreasing
- Try increading
Evaluating a hypothesis
Training/testing procedure for linear regression
- Learn parameter from training data(minimizing training error )
- Compute test set error.
把training data 分成2组,一组用来training(如70%的数据),另一组test(30%的数据)用来验证。
Training/testing procedure for logistic regression
- Learn parameter from training data
- Compute test set error
- MisclassificaDon error (0/1 misclassificaDon error):
Model selection and training/validation/test sets
Model selection
d = degree of polynomial,即 中x的最高次。如 d=3
把数据集随机的分成3部分,可以先将数据随机的打乱,然后取前60%作为training set,再接着取20%作为cross validation set,最后的20%作为test set.
Train/validation/test error
- Training error:
- Corss validation error:
- Test error:
Diagnosing bias vs. variance
Bias/Variance
- High bias(underfit) small d.
- High variance(overfit) large d
图参考slides#p17
- Bias(underfit):
- will be high
- Variance(overfit)
- will be low
上面讨论的是error 与 d的关系。当d较小时,会导致error偏大underfit, 值较大,与接近。 当d很大时,error较小overfit,较小,远大于
# Regularization and bias/variance
- High bias(underfit) large
- High variance(overfit) small
参考slide#p23
这里讨论的是error与的关系。当较小时,error较小(overfit).
Learning curves
slide#p25
- If a learning algorithm is suffering from high bias, getting more training data will not(by itsetl) help much.
- If a learning algorithm is suffering from high vairance, getting more training data is likely to help.
Deciding what to try next
- Get more training examples fixes high vairance
- Try smaller sets of features fixes high vairance
- Try getting additional features fixes hig bias
- Try adding polynomial features ( etc) fixes hig bias
- Try decreasing fixes hig bias
- Try increading fixes high vairance
Neural networks and overfitting
- “Small” neural network(fewer parameters; more prone to underfitting)
- Computationally cheaper
- “Large” neural network(more parameters; more prone to overfitting)
- Computationally more expensive
- Use regularization() to address overfitting