Neural network for multi label classification with large number of classes outputs only zero

Question

I am training a neural network for multilabel classification, with a large number of classes (1000). Which means more than one output can be active for every input. On an average, I have two classes active per output frame. On training with a cross entropy loss the neural network resorts to outputting only zeros, because it gets the least loss with this output since 99.8% of my labels are zeros. Any suggestions on how I can push the network to give more weight to the positive classes?

Btw: 99.8% is just a number, you know that a 0.2% of error on average corresponds to 0.002*1000, so 2 wrong labels per training instance on average. BTW are you using categorical cross_entropy or binary_crossentropy with sigmoids on the last layer? — Tommaso Guerrini
– Tommaso Guerrini, Commented Feb 10, 2017 at 14:52
@TommasoGuerrini used python+ keras, sigmoid and binary_crossentropy. Now testing with categorical_crossentropy, the network is outputting values closer to 1 now. But the loss is too high for now. Waiting to see how it trains over more epochs now. — Yakku
– Yakku, Commented Feb 10, 2017 at 15:14
@TommasoGuerrini I did not understand the purpose of the callback. — Yakku
– Yakku, Commented Feb 10, 2017 at 15:27
you may try sparse_categorical_crossentropy .. By the way: when training don't just look at the loss function, look also at the binary_accuracy ok? I have a similar case to yours and using mean squared error as loss function I obtained a better binary accuracy than when using binary logloss :) — Tommaso Guerrini
– Tommaso Guerrini, Commented Feb 10, 2017 at 16:10

Dudelstein · Accepted Answer · 2023-06-29 08:38:37Z

9

Tensorflow has a loss function weighted_cross_entropy_with_logits, which can be used to give more weight to the 1's. So it should be applicable to a sparse multi-label classification setting like yours.

From the documentation:

This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.

The argument pos_weight is used as a multiplier for the positive targets

If you use the tensorflow backend in Keras, you can use the loss function like this (Keras 2.1.1):

import tensorflow as tf
import keras.backend.tensorflow_backend as tfb

POS_WEIGHT = 10  # multiplier for positive targets, needs to be tuned

def weighted_binary_crossentropy(target, output):
    """
    Weighted binary crossentropy between an output tensor 
    and a target tensor. POS_WEIGHT is used as a multiplier 
    for the positive targets.

    Combination of the following functions:
    * keras.losses.binary_crossentropy
    * keras.backend.tensorflow_backend.binary_crossentropy
    * tf.nn.weighted_cross_entropy_with_logits
    """
    # transform back to logits
    _epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    output = tf.log(output / (1 - output))
    # compute weighted loss
    loss = tf.nn.weighted_cross_entropy_with_logits(targets=target,logits=output, pos_weight=POS_WEIGHT)
    return tf.reduce_mean(loss, axis=-1)

Then in your model:

model.compile(loss=weighted_binary_crossentropy, ...)

I have not found many resources yet which report well working values for the pos_weight in relation to the number of classes, average active classes, etc.

edited Jun 29, 2023 at 8:38

Dudelstein

3933 silver badges10 bronze badges

answered Nov 15, 2017 at 16:55

tobigue

2731 gold badge3 silver badges7 bronze badges

1

$\begingroup$ Also, it might be a good idea to evaluate the f-measure in a callback after each epoch when tuning the hyperparameters (such as pos_weights). $\endgroup$

tobigue
– tobigue

2017-11-15 16:56:07 +00:00
Commented Nov 15, 2017 at 16:56
1

$\begingroup$ Is there a corresponding weighted_binary_accuracy metric that can be used for the model as well? $\endgroup$

CMCDragonkai
– CMCDragonkai

2019-10-21 08:20:08 +00:00
Commented Oct 21, 2019 at 8:20
$\begingroup$ Lifesaver, but I could also use something like weighted_binary_accuracy $\endgroup$

David Cian
– David Cian

2020-06-16 17:26:05 +00:00
Commented Jun 16, 2020 at 17:26
$\begingroup$ You can just use binary accuracy actually, unless you really want to weigh the accuracy as well $\endgroup$

David Cian
– David Cian

2020-06-16 17:50:02 +00:00
Commented Jun 16, 2020 at 17:50
$\begingroup$ about the proper values for pos_weight, documenation suggests that any value above 1 increase recall, while any value less than 1 increase precision. $\endgroup$

Naveen Reddy Marthala
– Naveen Reddy Marthala

2021-10-27 12:08:53 +00:00
Commented Oct 27, 2021 at 12:08

| Show 1 more comment

MehmedB · Accepted Answer · 2023-07-03 07:38:19Z

Update for tensorflow 2.6.0:

I was going to write a comment but there are many things that needs to be changed for @tobigue answer to work. And I am not entirely sure if everything is correct with my answer. To make things work:

You need to replace import keras.backend.tensorflow_backend as tfb with import keras.backend as tfb
The target parameter in tf.nn.weighted_cross_entropy_with_logits needs to be changed to labels
tf.log needs to be called like this: tf.math.log
To make this custom loss function to work with keras, you need to import get_custom_objects and define the custom loss function as a loss function. So, from keras.utils.generic_utils import get_custom_objects and then before you compile the model you need to: get_custom_objects().update({"weighted_binary_crossentropy": weighted_binary_crossentropy})
I also encountered this error but it may not be the same for everyone. The error is: TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'. To fix this error, I have converted the target to float32 like this: target = tf.cast(target, tf.float32)

So, the final code that I am using is this:

import tensorflow as tf
import keras.backend as tfb
from keras.utils.generic_utils import get_custom_objects

POS_WEIGHT = 10  # multiplier for positive targets, needs to be tuned
def weighted_binary_crossentropy(target, output):
    """
    Weighted binary crossentropy between an output tensor
    and a target tensor. POS_WEIGHT is used as a multiplier
    for the positive targets.

    Combination of the following functions:
    * keras.losses.binary_crossentropy
    * keras.backend.tensorflow_backend.binary_crossentropy
    * tf.nn.weighted_cross_entropy_with_logits
    """
    # transform back to logits
    _epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
    output = tf.math.log(output / (1 - output))
    # compute weighted loss
    target = tf.cast(target, tf.float32)
    loss = tf.nn.weighted_cross_entropy_with_logits(labels=target,
                                                    logits=output,
                                                    pos_weight=POS_WEIGHT)
    return tf.reduce_mean(loss, axis=-1)

Then in your model

get_custom_objects().update({"weighted_binary_crossentropy": weighted_binary_crossentropy})
model.compile(loss='weighted_binary_crossentropy', ...)

i am using tf.keras. i have dense as my final layer, with number of units equal to number of unique labels. should i use no activation or sigmoid activation in my final layer, while using this loss? i shouldn't, correct? — Naveen Reddy Marthala
– Naveen Reddy Marthala, Commented Nov 9, 2021 at 7:48

Stack Exchange Network

Neural network for multi label classification with large number of classes outputs only zero

2 Answers 2

Hot Network Questions

Neural network for multi label classification with large number of classes outputs only zero

2 Answers 2

Related

Hot Network Questions