Bias and Variance in Machine Learning
Sorry for the lack of updates...Have been rushing to complete the Machine Learning course as fast as possible...
Good news!!! We managed to survive through the whole course!!!! Hahaha
Some definitions first before delving into the details...
Given the complexity of some data, a linear regression may not be able to formulate an accurate prediction. As the linear regression only factors in 1 specific feature of the data, the 'best-fit' line does not fit the data well enough. Such under-fitting problem leads to high training costs and also causes the hypothesis to be biased.
As mentioned above, to resolve the under-fitting issue, the degree of the polynomial can be increased slowly until the errors of the cross-validation set is at its minimum. This is the most ideal case where the regression fits the data perfectly.
However, a high degree polynomial may over-fit the data and this causes the errors of the training set to be low at the expense of greater errors on the cross-validation set. The hypothesis is said to have high variance.
Good news!!! We managed to survive through the whole course!!!! Hahaha
Summary of Week 6 lesson:
Some definitions first before delving into the details...
Bias:
- The hypothesis is known to be biased if it tends to overestimate or underestimate a parameter
Variance:
- The hypothesis is known to have high variance if the data are far away from its average value
Given the complexity of some data, a linear regression may not be able to formulate an accurate prediction. As the linear regression only factors in 1 specific feature of the data, the 'best-fit' line does not fit the data well enough. Such under-fitting problem leads to high training costs and also causes the hypothesis to be biased.
Ways to reduce the bias:
- Add more features / Increase the degree of the polynomial
- Decrease λ
- Use larger neural network
As mentioned above, to resolve the under-fitting issue, the degree of the polynomial can be increased slowly until the errors of the cross-validation set is at its minimum. This is the most ideal case where the regression fits the data perfectly.
However, a high degree polynomial may over-fit the data and this causes the errors of the training set to be low at the expense of greater errors on the cross-validation set. The hypothesis is said to have high variance.
Ways to reduce variance:
- Use more training examples
- Use smaller sets of features
- Increase λ
- Use smaller neural network
Comments
Post a Comment