Huber Loss Function in Machine Learning
The Huber Loss Function is a popular loss function used primarily in regression tasks. It is designed to be robust to outliers combining the best properties of two common loss functions: Mean Squared Error (MSE) and Mean Absolute Error (MAE). Unlike MSE, which can be heavily influenced by large errors (outliers) and MAE which can be less sensitive to small errors the Huber loss behaves like MSE for small prediction errors and switches to MAE for larger errors. This is useful when your dataset contains noisy data or outliers, helping models learn more reliably and avoid being skewed by extreme values.
L_\delta(a) =\begin{cases}\frac{1}{2} a^2 & \text{if } |a| \leq \delta \\\delta \cdot \left( |a| - \frac{1}{2} \delta \right) & \text{otherwise}\end{cases}
Where,
Where do we use Huber Loss Function?

- Regression Problems with Outliers: Ideal for datasets that contain occasional large errors Huber loss avoids being overly influenced by them unlike MSE.
- Time Series Forecasting: Used when forecasting values where sudden spikes or drops shouldn't skew the model too much.
- Object Detection in Computer Vision: In models like Faster R-CNN and YOLO, Huber loss is used for bounding box regression to balance stability and error tolerance.
- Deep Learning on Noisy Data: Helps neural networks learn better when the target values are noisy or imprecise, providing stable training dynamics.
- Financial and Risk Modeling: Useful when modeling markets or credit risk where extreme values (outliers) occur but shouldn't dominate predictions.
How Does it Work?
The Huber Loss works by blending two types of loss functions: Mean Squared Error (MSE) for small errors and Mean Absolute Error (MAE) for large errors. It uses a threshold value
- If the error
|y - \hat{y}| is small (≤\delta ) it uses MSE giving smooth gradients that help the model learn accurately. - If the error is large (>
\delta ) it switches to MAE reducing the influence of outliers by applying a linear penalty.
This combination allows models to be both precise for typical data and robust against outliers.
For Example
- This function calculates the Huber Loss between a true value and a predicted value.
- It uses squared error if the error is small (≤
\delta ) and linear error if it's large, reducing sensitivity to outliers. - In the example since the error is 3 (larger than
\delta = 1.5) the function returns a robust linear penalty of 3.375.
def huber_loss(y, y_pred, delta=1.5):
error = y - y_pred
if abs(error) <= delta:
return 0.5 * error ** 2
else:
return delta * (abs(error) - 0.5 * delta)
y_true = 10
y_pred = 13
loss = huber_loss(y_true, y_pred)
print("Huber Loss:", loss)
Output:
Huber Loss: 3.375
Applications
- Robust Regression: Huber loss is commonly used in regression problems where the data contains outliers. It provides the stability of squared error for small values while limiting the influence of large errors.
- Time Series Forecasting: In forecasting models it helps when datasets have occasional noise or spikes, reducing the distortion caused by sudden outliers.
- Computer Vision: In tasks like bounding box regression in object detection, Huber loss also called smooth L1 loss balances accuracy and robustness.
- Training Deep Neural Networks: Huber loss is used when you need a robust loss function that still has smooth gradients, helping improve convergence in noisy datasets.