10
$\begingroup$

I am training a neural network for multilabel classification, with a large number of classes (1000). Which means more than one output can be active for every input. On an average, I have two classes active per output frame. On training with a cross entropy loss the neural network resorts to outputting only zeros, because it gets the least loss with this output since 99.8% of my labels are zeros. Any suggestions on how I can push the network to give more weight to the positive classes?

$\endgroup$
13
  • $\begingroup$ What are you using as software? Python + Keras? $\endgroup$ Commented Feb 10, 2017 at 14:49
  • $\begingroup$ Btw: 99.8% is just a number, you know that a 0.2% of error on average corresponds to 0.002*1000, so 2 wrong labels per training instance on average. BTW are you using categorical cross_entropy or binary_crossentropy with sigmoids on the last layer? $\endgroup$ Commented Feb 10, 2017 at 14:52
  • $\begingroup$ @TommasoGuerrini used python+ keras, sigmoid and binary_crossentropy. Now testing with categorical_crossentropy, the network is outputting values closer to 1 now. But the loss is too high for now. Waiting to see how it trains over more epochs now. $\endgroup$ Commented Feb 10, 2017 at 15:14
  • $\begingroup$ @TommasoGuerrini I did not understand the purpose of the callback. $\endgroup$ Commented Feb 10, 2017 at 15:27
  • 1
    $\begingroup$ you may try sparse_categorical_crossentropy .. By the way: when training don't just look at the loss function, look also at the binary_accuracy ok? I have a similar case to yours and using mean squared error as loss function I obtained a better binary accuracy than when using binary logloss :) $\endgroup$ Commented Feb 10, 2017 at 16:10

2 Answers 2

9
$\begingroup$

Tensorflow has a loss function weighted_cross_entropy_with_logits, which can be used to give more weight to the 1's. So it should be applicable to a sparse multi-label classification setting like yours.

From the documentation:

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.

The argument pos_weight is used as a multiplier for the positive targets

If you use the tensorflow backend in Keras, you can use the loss function like this (Keras 2.1.1):

import tensorflow as tf
import keras.backend.tensorflow_backend as tfb

POS_WEIGHT = 10  # multiplier for positive targets, needs to be tuned

def weighted_binary_crossentropy(target, output):
    """
    Weighted binary crossentropy between an output tensor 
    and a target tensor. POS_WEIGHT is used as a multiplier 
    for the positive targets.

    Combination of the following functions:
    * keras.losses.binary_crossentropy
    * keras.backend.tensorflow_backend.binary_crossentropy
    * tf.nn.weighted_cross_entropy_with_logits
    """
    # transform back to logits
    _epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    output = tf.log(output / (1 - output))
    # compute weighted loss
    loss = tf.nn.weighted_cross_entropy_with_logits(targets=target,logits=output, pos_weight=POS_WEIGHT)
    return tf.reduce_mean(loss, axis=-1)

Then in your model:

model.compile(loss=weighted_binary_crossentropy, ...)

I have not found many resources yet which report well working values for the pos_weight in relation to the number of classes, average active classes, etc.

$\endgroup$
6
  • 1
    $\begingroup$ Also, it might be a good idea to evaluate the f-measure in a callback after each epoch when tuning the hyperparameters (such as pos_weights). $\endgroup$ Commented Nov 15, 2017 at 16:56
  • 1
    $\begingroup$ Is there a corresponding weighted_binary_accuracy metric that can be used for the model as well? $\endgroup$ Commented Oct 21, 2019 at 8:20
  • $\begingroup$ Lifesaver, but I could also use something like weighted_binary_accuracy $\endgroup$ Commented Jun 16, 2020 at 17:26
  • $\begingroup$ You can just use binary accuracy actually, unless you really want to weigh the accuracy as well $\endgroup$ Commented Jun 16, 2020 at 17:50
  • $\begingroup$ about the proper values for pos_weight, documenation suggests that any value above 1 increase recall, while any value less than 1 increase precision. $\endgroup$ Commented Oct 27, 2021 at 12:08
1
$\begingroup$

Update for tensorflow 2.6.0:

I was going to write a comment but there are many things that needs to be changed for @tobigue answer to work. And I am not entirely sure if everything is correct with my answer. To make things work:

  1. You need to replace import keras.backend.tensorflow_backend as tfb with import keras.backend as tfb
  2. The target parameter in tf.nn.weighted_cross_entropy_with_logits needs to be changed to labels
  3. tf.log needs to be called like this: tf.math.log
  4. To make this custom loss function to work with keras, you need to import get_custom_objects and define the custom loss function as a loss function. So, from keras.utils.generic_utils import get_custom_objects and then before you compile the model you need to: get_custom_objects().update({"weighted_binary_crossentropy": weighted_binary_crossentropy})
  5. I also encountered this error but it may not be the same for everyone. The error is: TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'. To fix this error, I have converted the target to float32 like this: target = tf.cast(target, tf.float32)

So, the final code that I am using is this:

import tensorflow as tf
import keras.backend as tfb
from keras.utils.generic_utils import get_custom_objects

POS_WEIGHT = 10  # multiplier for positive targets, needs to be tuned
def weighted_binary_crossentropy(target, output):
    """
    Weighted binary crossentropy between an output tensor
    and a target tensor. POS_WEIGHT is used as a multiplier
    for the positive targets.

    Combination of the following functions:
    * keras.losses.binary_crossentropy
    * keras.backend.tensorflow_backend.binary_crossentropy
    * tf.nn.weighted_cross_entropy_with_logits
    """
    # transform back to logits
    _epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    output = tf.math.log(output / (1 - output))
    # compute weighted loss
    target = tf.cast(target, tf.float32)
    loss = tf.nn.weighted_cross_entropy_with_logits(labels=target,
                                                    logits=output,
                                                    pos_weight=POS_WEIGHT)
    return tf.reduce_mean(loss, axis=-1)

Then in your model

get_custom_objects().update({"weighted_binary_crossentropy": weighted_binary_crossentropy})
model.compile(loss='weighted_binary_crossentropy', ...)
$\endgroup$
1
  • $\begingroup$ i am using tf.keras. i have dense as my final layer, with number of units equal to number of unique labels. should i use no activation or sigmoid activation in my final layer, while using this loss? i shouldn't, correct? $\endgroup$ Commented Nov 9, 2021 at 7:48

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.