Logistic Regression
Classification
- 0: “Negative Class”
- 1: “Positive Class”
Threshold classifier output at o.5:
- if , predict “y = 1”
- if , predict “y = 0”
Classification: y = 0 or 1
can be > 1 or < 0
Logistic Regression:
Hypothesis Representation
Logistic Regresssion Model
Want
g(z)
always lies between 0 and 1.
g(z)
is called Sigmoid function or Logistic function
Descision Boundary
- Predict “y=1” if or
- Predict “y=0” if or
Cost Function
Training examples:
Logistic regression Cost function
Notice: In the formula above, the dot in the end can’t be omitted!!!
The hypothesis will now be more accurate (or at least just as accurate) with new features, so the cost function will decrease.
The cost function J(θ) is guaranteed to be convex for logistic regression.
Simplifiled Cost Function and Gradient Descent
The Can be written in the below form:
Want :
Gradient Descent:
Repeat {
}
Advanced Optimization
options = optimset('GradObj', 'on', 'MaxIter', '100');
initialTheta = zeros(2,1);
[optTheta, functionVal, exitFlag] ...
= fminunc(@costFunction, initialTheta, options);
funciton [jVal, gradient] = costFunction(theta)
jVal = [code to comput J(θ)];
gradient(1) = [code to compute ∂/∂θ1J(θ)];
.
.
gradient(n+1) = [code to compute ∂/∂θnJ(θ)];
Multiclass Classification: One-vs-all
Train a logistic regression classifier for each class to predict the probability that .
Regularization
The problem of Overfitting
Overfitting: If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.
Addressing overfitting:
Options:
- Reduce number of features.
- Manually select which features to keep
- Model selection algorithm
- Regularization
- Keep all the features, but reduce magnitude/values of parameters
- Works well when we have a lot of features, each of which contributes a bit to predicting y.
By adding a new feature, our model must be more (or just as) expressive, thus allowing it learn more complex hypotheses to fit the training set.
Adding many new features gives us more expressive models which are able to better fit our training set. If too many new features are added, this can lead to overfitting of the training set.
Cost Funciton
Regularized Linear Regression
Gradient descent:
Repeat {
}
Regularized Logistic Regression
Gradient descent:
Repeat {
}