What is Underfitting ?
A model that learns nothing is a model that is worthless.
Underfitting occurs when a machine learning model is too simple to capture the underlying patterns in the data, leading to poor performance on both training and test datasets.
Think of it like trying to capture the essence of a complex landscape with just a stick figure.
The drawing is too simple.
It fails to represent the mountains, the rivers, the trees.
The stick figure misses the point entirely, resulting in a misleading and useless interpretation of the landscape.
This is a critical failure in machine learning.
An underfit model hasn’t just failed to generalize.
It has failed to learn in the first place.
What is underfitting in machine learning and how does it occur?
Underfitting is a state of failure.
It means your model is not complex enough to understand the inherent patterns in your data.
It’s like trying to solve a calculus problem using only basic addition.
The tool is fundamentally mismatched to the complexity of the problem.
This happens for a few common reasons:
- The model is too simple: Choosing a linear model when the data has a non-linear relationship is a classic example. The model’s rigid assumptions don’t match reality.
- Insufficient features: The data you provide to the model lacks the necessary information. The model can’t learn a pattern that isn’t represented in the features.
- Excessive regularization: Techniques used to prevent the opposite problem, overfitting, can be applied too aggressively, strangling the model and preventing it from learning anything.
Underfitting is defined by its core characteristics.
It is a high bias model.
Bias, in this context, refers to the model’s simplistic and rigid assumptions about the data. High bias means these assumptions are wrong, causing the model to miss the target consistently.
It’s the direct opposite of its more famous cousin, overfitting.
- Overfitting is like a student who memorizes the answers to a practice test but doesn’t understand the concepts. It captures all the noise and random fluctuations in the training data as if they were real patterns.
- Underfitting is like a student who doesn’t study at all. It fails to capture the actual signal, the core concepts, in the first place.
How can underfitting impact real-world applications?
An underfit model is not just inaccurate; it’s dangerously unreliable.
Because it has failed to learn the basic relationships in the data, its predictions are essentially random guesses based on overly simple rules.
Consider the real-world example of predicting house prices.
Imagine you build a model that only considers one feature: the square footage of the house.
This is a recipe for underfitting.
The implication is severe.
- The model would predict that a 2,000-square-foot mansion in Beverly Hills costs the same as a 2,000-square-foot dilapidated house in a remote area.
- It ignores critical factors like location, number of bedrooms, age of the property, and local amenities.
- The model fails to capture the complex, multi-faceted reality of the housing market. Its predictions would be wildly inaccurate and useless for buyers, sellers, or real estate agents.
This same failure applies everywhere.
- In finance, an underfit credit risk model might only look at income, ignoring debt and payment history, leading to disastrous lending decisions.
- In medicine, an underfit diagnostic model might ignore key symptoms, failing to identify diseases it was built to detect.
What are the typical indicators of underfitting?
Spotting underfitting is usually straightforward because the model’s failure is total.
The number one indicator is poor performance everywhere.
High Training Error & High Test Error
- You train your model, and it achieves low accuracy (or high error) on the training data it was given.
- You then test it on new, unseen data, and it performs just as poorly.
This is the classic signature. The model is struggling to make sense of any data, both familiar and new.
Contrast this with overfitting:
- Overfitting: Very low training error, very high test error.
- Underfitting: High training error, high test error.
- Good Fit: Low training error, low test error.
Another clear indicator is visualization.
If you plot your model’s predictions against the actual data points, an underfit model will show a clear and obvious mismatch. You’ll see a straight line desperately trying to fit a beautiful curve, failing to capture the data’s shape.
How can one prevent or address underfitting in models?
Fixing underfitting is all about increasing your model’s capacity to learn.
You need to give it better tools, better information, or more freedom.
Here are the primary strategies:
- Increase Model Complexity: If you’re using a simple model, switch to a more powerful one.
- Move from linear regression to polynomial regression, a decision tree, or a neural network.
- If using a neural network, add more layers or more neurons per layer.
- Feature Engineering: Your model can’t learn from information it doesn’t have.
- Add more relevant features to the dataset. For the housing price example, this means adding columns for location, age, number of bedrooms, etc.
- Create new features from existing ones (e.g., a “debt-to-income ratio” feature from separate debt and income columns).
- Reduce Regularization: Regularization techniques (like L1 or L2) are designed to prevent overfitting by penalizing model complexity. If your model is underfitting, you might be penalizing it too much. Try reducing the regularization parameter.
- Train Longer: Sometimes, especially with complex models like neural networks, the model simply hasn’t been trained for enough epochs to learn the patterns.
Essentially, you need to diagnose why your model is too simple and give it what it needs—be it a bigger brain (more complexity) or better eyes (more features).
Quick Test: Diagnosis
You’ve built a model to predict customer churn. On your historical training data, it achieves 60% accuracy. On new customer data, it also achieves 60% accuracy. Is this a sign of underfitting or overfitting?
Answer: This is a classic sign of underfitting. The model’s performance is poor on both the data it has already seen and new data, indicating it failed to learn the underlying patterns in the first place.
Deeper Questions on Underfitting
What is the relationship between underfitting and bias?
Underfitting is the direct result of high bias. In machine learning, “bias” refers to the simplifying assumptions a model makes to approximate the target function. A model with high bias makes very strong, rigid assumptions (e.g., “the relationship between these variables is a straight line”). When these assumptions don’t match the complex reality of the data, the model underfits.
How does underfitting relate to the Bias-Variance Tradeoff?
The Bias-Variance Tradeoff is a central concept in machine learning. It describes the tension between a model’s simplicity (bias) and its complexity (variance).
- High Bias / Low Variance: This is underfitting. The model is simple, stable, but systematically wrong.
- Low Bias / High Variance: This is overfitting. The model is complex, flexible, but unstable and captures noise.
The goal is to find a balance with low bias and low variance.
Does more data solve underfitting?
Not usually. If your model is fundamentally too simple for the task, feeding it more data won’t help it learn a pattern it’s not capable of representing. Giving more examples of a curve to a model that can only draw straight lines won’t teach it how to draw a curve. More features, however, can help.
Can underfitting happen with complex models like deep neural networks?
Yes, absolutely. A deep neural network can underfit if it’s not trained for enough epochs, if the learning rate is too low, or if it’s subjected to overly aggressive regularization (like extreme dropout). Just because a model is capable of complexity doesn’t mean it will achieve it during training.
Is underfitting always bad?
Yes. From a predictive performance standpoint, an underfit model is not useful. It has failed at its primary task of learning from data. While a simple model might be desirable for interpretability, if it underfits, it means its simple interpretation of the world is incorrect.
What’s an easy way to visualize underfitting?
For a simple regression problem, plot the actual data points on a scatter plot. Then, overlay the line or curve that your model has learned. If the model’s line cuts through the data, ignoring a clear pattern or trend, you are looking at underfitting.
How is underfitting different from just having a low-accuracy model?
They are deeply related. Underfitting is a primary cause for a model having fundamentally low accuracy on all datasets. A model could have low accuracy for other reasons (e.g., extremely noisy data, no real pattern to be found), but when there is a pattern and the model misses it, that’s underfitting.
Can you fix underfitting by simply removing data?
No. Removing data will likely make the problem worse or simply mask it. The issue isn’t that there’s too much data; it’s that the model lacks the capacity to understand the data it has.
The goal of building a machine learning model is to find the perfect balance—a model complex enough to capture the real patterns, but not so complex that it starts memorizing the noise. Underfitting is a failure to clear the first hurdle. It’s a sign that your approach is too simple for the reality you’re trying to model.