3 Logic Regression and Regularization

| 分类 course  | 标签 ml 

Logistic Regression

Classification

  • 0: “Negative Class”
  • 1: “Positive Class”

Threshold classifier output at o.5:

  • if , predict “y = 1”
  • if , predict “y = 0”

Classification: y = 0 or 1

can be > 1 or < 0

Logistic Regression:

Hypothesis Representation

Logistic Regresssion Model

Want

g(z)always lies between 0 and 1.

g(z) is called Sigmoid function or Logistic function

Descision Boundary

  • Predict “y=1” if or
  • Predict “y=0” if or

Cost Function

Training examples:

Logistic regression Cost function

Notice: In the formula above, the dot in the end can’t be omitted!!!

The hypothesis will now be more accurate (or at least just as accurate) with new features, so the cost function will decrease.

The cost function J(θ) is guaranteed to be convex for logistic regression.

Simplifiled Cost Function and Gradient Descent

The Can be written in the below form:

Want :

Gradient Descent:

Repeat {

}

Advanced Optimization

options = optimset('GradObj', 'on', 'MaxIter', '100');
initialTheta = zeros(2,1);
[optTheta, functionVal, exitFlag] ...
    = fminunc(@costFunction, initialTheta, options);

funciton [jVal, gradient] = costFunction(theta)
    jVal = [code to comput J(θ)];
    gradient(1) = [code to compute/∂θ1J(θ)];
    .
    .
    gradient(n+1) = [code to compute/∂θnJ(θ)];

Multiclass Classification: One-vs-all

Train a logistic regression classifier for each class to predict the probability that .

Regularization

The problem of Overfitting

Overfitting: If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.

Addressing overfitting:

Options:

  1. Reduce number of features.
    • Manually select which features to keep
    • Model selection algorithm
  2. Regularization
    • Keep all the features, but reduce magnitude/values of parameters
    • Works well when we have a lot of features, each of which contributes a bit to predicting y.

By adding a new feature, our model must be more (or just as) expressive, thus allowing it learn more complex hypotheses to fit the training set.

Adding many new features gives us more expressive models which are able to better fit our training set. If too many new features are added, this can lead to overfitting of the training set.

Cost Funciton

Regularized Linear Regression

Gradient descent:

Repeat {

}

Regularized Logistic Regression

Gradient descent:

Repeat {

}


上一篇     下一篇