Bias and Variance in Machine Learning

Sorry for the lack of updates...Have been rushing to complete the Machine Learning course as fast as possible...

Good news!!! We managed to survive through the whole course!!!! Hahaha

Summary of Week 6 lesson:


Some definitions first before delving into the details...


Bias: 


  • The hypothesis is known to be biased if it tends to overestimate or underestimate a parameter


Variance: 


  • The hypothesis is known to have high variance if the data are far away from its average value


Given the complexity of some data, a linear regression may not be able to formulate an accurate prediction. As the linear regression only factors in 1 specific feature of the data, the 'best-fit' line does not fit the data well enough. Such under-fitting problem leads to high training costs and also causes the hypothesis to be biased.



Ways to reduce the bias:


  • Add more features / Increase the degree of the polynomial
  • Decrease λ 
  • Use larger neural network

As mentioned above, to resolve the under-fitting issue, the degree of the polynomial can be increased slowly until the errors of the cross-validation set is at its minimum. This is the most ideal case where the regression fits the data perfectly.

However, a high degree polynomial may over-fit the data and this causes the errors of the training set to be low at the expense of greater errors on the cross-validation set. The hypothesis is said to have high variance.



Ways to reduce variance:


  • Use more training examples
  • Use smaller sets of features
  • Increase λ 
  • Use smaller neural network

Comments

Popular posts from this blog

How to connect Python to MySQL Workbench

Predict EPL results (Part 2: Neural Network example)