Bias-Variance Tradeoff

Ankit kumar
4 min readJan 7, 2024

--

— helps to deal with overfitting and underfitting problems

It helps us to get an optimal balance between bias and variance to get the best model, avoiding underfitting and overfitting.

So let’s start with the basics and see how they make a difference to our machine learning Models.

What is bias?

It is the difference between the average prediction of our model and the correct value that we are trying to predict. The model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high errors in training and test data.

Generally, linear algorithms have a high bias making them fast to learn and easier to understand but generally less flexible. In turn, they have lower predictive performance on complex problems that fail to meet the simplifying assumptions of the algorithm bias.

  • Low Bias: Suggests fewer assumptions about the form of the target function.
  • High-Bias: Suggests more assumptions about the form of the target function.

Examples of low-bias machine learning algorithms include Decision Trees, K-Nearest Neighbors, and Support Vector Machines.

Examples of high-bias machine learning algorithms include Linear Regression, Linear Discriminant Analysis, and Logistic Regression.

What is variance?

Variance is the variability of model prediction for a given data point or a value that tells us the spread of our data. A model with high variance pays a lot of attention to training data and does not generalize on the data that it hasn’t seen before. As a result, such models perform very well on training data but have high error rates on test data.

  • Low Variance: Suggests small changes to the estimate of the target function with changes to the training dataset.
  • High Variance: Suggests large changes to the estimate of the target function with changes to the training dataset.

Generally, nonlinear machine learning algorithms that have a lot of flexibility have a high variance. For example, decision trees have a high variance, that is even higher if the trees are not pruned before use.

Examples of low-variance machine learning algorithms include Linear Regression, Linear Discriminant Analysis, and Logistic Regression.

Examples of high-variance machine learning algorithms include Decision Trees, K-Nearest Neighbors, and Support Vector Machines.

Mathematically

Let the variable we are trying to predict as Y and other covariates as X. We assume there is a relationship between the two such that

Y=f(X) + e

Where e is the error term and it’s normally distributed with a mean of 0.

We will make a model f^(X) of f(X) using linear regression or any other modeling technique.

So the expected squared error at a point x is

The Err(x) can be further decomposed as

Err(x) is the sum of Bias², variance, and the irreducible error.

Irreducible error is an error that can’t be reduced by creating good models. It is a measure of the amount of noise in our data. Here it is important to understand that no matter how good we make our model, our data will have a certain amount of noise or irreducible error that can not be removed.

In supervised learning, underfitting happens when a model is unable to capture the underlying pattern of the data. These models usually have high bias and low variance. It happens when we have very less amount of data to build an accurate model or when we try to build a linear model with non-linear data. Also, these kinds of models are very simple to capture the complex patterns in data like Linear and logistic regression.

In supervised learning, overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over noisy datasets. These models have low bias and high variance. These models are very complex like Decision trees which are prone to overfitting.

Why Bias Variance Tradeoff?

If our model is too simple and has very few parameters then it may have high bias and low variance. On the other hand, if our model has a large number of parameters then it’s going to have high variance and low bias. So we need to find the right/good balance without overfitting and underfitting the data.

This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time.

To build a good model, we need to find a good balance between bias and variance such that it minimizes the total error.

An optimal balance of bias and variance would never overfit or underfit the model.

Therefore understanding bias and variance is critical for understanding the behavior of prediction models.

--

--

Ankit kumar
Ankit kumar

No responses yet