42.47. Loss Function#

42.47.1. Objective of this section#

We have already learned math and code for “Gradient Descent”, as well as other optimization techniques.

In this section, we will learn more about loss functions.

As a learner, you can focus on learning the L1, L2 loss, and Classification Losses in the regression loss function in this section. And learn about other loss functions.

42.47.2. Concept of loss function#

  • A loss function gauges the disparity between the model’s predictions and the actual values. Simply put, it indicates how “off” our model is. By optimizing this function, our objective is to identify parameters that bring the model’s predictions as close as possible to the true values.

  • The function we want to minimize or maximize is called the objective function or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function. — Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville

42.47.3. Classification of loss functions#

42.47.3.1. Regression Losses#

These are employed when the objective is to predict a continuous outcome.

  • Mean Squared Error (MSE): Measures the average squared discrepancies between predictions and actual values, emphasizing larger errors.

  • Mean Absolute Error (MAE): Calculates the average of absolute differences between predicted outcomes and actual observations, offering a linear penalty for each deviation.

  • Huber Loss: A hybrid loss that’s quadratic for small differences and linear for large ones, providing resilience against outliers.

  • L1 Loss: Directly reflects the absolute discrepancies between predictions and real values, synonymous with MAE.

  • L2 Loss: Highlights squared differences between predictions and actuals, equivalent to MSE.

  • Smooth L1 Loss: An amalgamation of L1 and L2 losses, it provides a balance in handling both minor and major deviations.

42.47.3.2. Classification Losses#

Utilized for tasks requiring the prediction of discrete categories.

  • Cross Entropy Loss: Quantifies the dissimilarity between the predicted probability distribution and the actual class distribution.

  • Hinge Loss: A staple for Support Vector Machines (SVMs), it strives to categorize data by maximizing the decision boundary between classes.

  • Binary Cross Entropy Loss(Log Loss): It is intended for use with binary classification where the target values are in the set {0, 1}. It is a special case of Cross Entropy Loss, specially used for binary classification problems.

  • Multi-Class Cross-Entropy Loss: In this case, it is intended for use with multi-class classification where the target values are in the set {0, 1, 3, …, n}, where each class is assigned a unique integer value.It is an extension of Cross Entropy Loss and is used for multi-classification problems.

42.47.3.3. Structured Losses#

Tailored for intricate tasks involving structured data patterns.

  • Sequence Generation Loss: Emblematic examples include the CTC (Connectionist Temporal Classification) designed for undertakings such as speech and text identification.

  • Image Segmentation Loss: Noteworthy instances encompass the Dice loss and the IoU (Intersection over Union) loss.

42.47.3.4. Regularization Losses#

Rather than directly influencing the model’s predictions, these losses are integrated into the objective function to counteract excessive model complexity.

  • L1 Regularization (Lasso): Enforces sparsity by compelling certain model coefficients to be exactly zero.

  • L2 Regularization (Ridge): Curbs the unchecked growth of model parameters without nullifying them, ensuring the model remains generalized without undue complexity.

42.47.4. Common Loss Functions#

42.47.4.1. Regression Loss Functions#

  1. Mean Squared Error, MSE

\[ L(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

Where \(y_i\) is the actual value and \(\hat{y}_i\) is the predicted value.

import tensorflow as tf

y_true = tf.constant([1.0, 2.0, 3.0])
y_pred = tf.constant([1.5, 1.5, 3.5])
loss = tf.keras.losses.MSE(y_true, y_pred)

print(loss.numpy())
  1. Mean Absolute Error, MAE

\[ L(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| \]
loss = tf.keras.losses.MAE(y_true, y_pred)

print(loss.numpy())
  1. Huber Loss

\[\begin{split} L_{\delta}(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & \text{if } |y - \hat{y}| \leq \delta \\ \delta |y - \hat{y}| - \frac{1}{2}\delta^2 & \text{otherwise} \end{cases} \end{split}\]

Positioned between MSE and MAE, this loss function offers robustness against outliers.

loss = tf.keras.losses.Huber()(y_true, y_pred)

print(loss.numpy())
  1. L1 Loss

\[ L = ( y - f(x) )^2 \]

Corresponds to MAE.

  1. L2 Loss

\[ L = | y - f(x) | \]

Corresponds to MSE.

42.47.4.2. Classification Loss Functions#

  1. Hinge Loss

\[ L(y, \hat{y}) = \max(0, 1 - y \cdot \hat{y}) \]

Primarily used for Support Vector Machines, but it can also be employed for other classification tasks.

y_true = tf.constant([-1, 1, 1])  # binary class labels in {-1, 1}
y_pred = tf.constant([0.5, 0.3, -0.7])  # raw model outputs
loss = tf.keras.losses.Hinge()(y_true, y_pred)

print(loss.numpy())

42.47.5. Conclusion#

Loss functions hold a pivotal role in machine learning. By minimizing the loss, we enhance the accuracy of our model’s predictions. A deep understanding of various loss functions aids in selecting the most appropriate optimization technique for specific challenges.