nn_bce_loss {torch} | R Documentation |
Binary cross entropy loss
Description
Creates a criterion that measures the Binary Cross Entropy between the target and the output:
Usage
nn_bce_loss(weight = NULL, reduction = "mean")
Arguments
weight |
(Tensor, optional): a manual rescaling weight given to the loss
of each batch element. If given, has to be a Tensor of size |
reduction |
(string, optional): Specifies the reduction to apply to the output:
|
Details
The unreduced (i.e. with reduction
set to 'none'
) loss can be described as:
\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right]
where N
is the batch size. If reduction
is not 'none'
(default 'mean'
), then
\ell(x, y) = \left\{ \begin{array}{ll}
\mbox{mean}(L), & \mbox{if reduction} = \mbox{'mean';}\\
\mbox{sum}(L), & \mbox{if reduction} = \mbox{'sum'.}
\end{array}
\right.
This is used for measuring the error of a reconstruction in for example
an auto-encoder. Note that the targets y
should be numbers
between 0 and 1.
Notice that if x_n
is either 0 or 1, one of the log terms would be
mathematically undefined in the above loss equation. PyTorch chooses to set
\log (0) = -\infty
, since \lim_{x\to 0} \log (x) = -\infty
.
However, an infinite term in the loss equation is not desirable for several reasons.
For one, if either y_n = 0
or (1 - y_n) = 0
, then we would be
multiplying 0 with infinity. Secondly, if we have an infinite loss value, then
we would also have an infinite term in our gradient, since
\lim_{x\to 0} \frac{d}{dx} \log (x) = \infty
.
This would make BCELoss's backward method nonlinear with respect to x_n
,
and using it for things like linear regression would not be straight-forward.
Our solution is that BCELoss clamps its log function outputs to be greater than
or equal to -100. This way, we can always have a finite loss value and a linear
backward method.
Shape
Input:
(N, *)
where*
means, any number of additional dimensionsTarget:
(N, *)
, same shape as the inputOutput: scalar. If
reduction
is'none'
, then(N, *)
, same shape as input.
Examples
if (torch_is_installed()) {
m <- nn_sigmoid()
loss <- nn_bce_loss()
input <- torch_randn(3, requires_grad = TRUE)
target <- torch_rand(3)
output <- loss(m(input), target)
output$backward()
}