Bias and Variance in Machine Learning

February 26, 2019

Sorry for the lack of updates...Have been rushing to complete the Machine Learning course as fast as possible...

Good news!!! We managed to survive through the whole course!!!! Hahaha

Summary of Week 6 lesson:

Some definitions first before delving into the details...

Bias:

The hypothesis is known to be biased if it tends to overestimate or underestimate a parameter

Variance:

The hypothesis is known to have high variance if the data are far away from its average value

Given the complexity of some data, a linear regression may not be able to formulate an accurate prediction. As the linear regression only factors in 1 specific feature of the data, the 'best-fit' line does not fit the data well enough. Such under-fitting problem leads to high training costs and also causes the hypothesis to be biased.

Ways to reduce the bias:

Add more features / Increase the degree of the polynomial
Decrease λ
Use larger neural network

As mentioned above, to resolve the under-fitting issue, the degree of the polynomial can be increased slowly until the errors of the cross-validation set is at its minimum. This is the most ideal case where the regression fits the data perfectly.

However, a high degree polynomial may over-fit the data and this causes the errors of the training set to be low at the expense of greater errors on the cross-validation set. The hypothesis is said to have high variance.

Ways to reduce variance:

Use more training examples
Use smaller sets of features
Increase λ
Use smaller neural network

Search This Blog

Gooey confusion