pytorch loss function.
Cross Entropy 简单来说,交叉熵是用来衡量在给定的真实分布 $p_k$ 下,使用非真实分布 $q_k$ 所指定的策略 f(x) 消除系统的不确定性所需要付出的努力的大小。交叉熵的越低说明这个策略越好,我们总是 minimize 交叉熵,因为交叉熵越小,就证明算法所产生的策略越接近最优策略,也就间接证明我们的算法所计算出的非真实分布越接近真实分布。交叉熵损失函数从信息论的角度来说,其实来自于 KL 散度,只不过最后推导的新式等价于交叉熵的计算公式:
从信息论的视角来理解: 信息量/信息熵(熵)/交叉熵/条件熵
信息量: 一个事件的信息量就是这个时间发生的概率的负对数,概率越大,所带来的信息就越少嘛。至于为什么是负对数,就要问香农了。。起码要满足$P(X)=1$时信息量为0,且始终大于0
$$-\log P(X)$$
信息熵, 也就是熵,是随机变量不确定性的度量,依赖于事件X的概率分布。即信息熵是信息量的期望。即求离散分布列的期望~~
$$H(p) = -\sum_{i=1}^np_i\log p_i$$
交叉熵: 回归到分类问题来,我们通过score function得到一个结果(10,1),通过softmax函数压缩成0到1的概率分布,我们称为 $q_i=\dfrac{e^{f_{y_i}}}{\sum_je^{f_j}}$ 吧,
$$H(p,q) = -\sum_{i=1}^np_i\log q_i$$
这就是我们所说的交叉熵,通过 Gibbs’ inequality 知道:$H(p,q)>=H(p)$ 恒成立,当且仅当 $q_i$ 分布和 $p_i$ 相同时,两者相等。
相对熵: 跟交叉熵是同样的概念,$D(p||q)=H(p,q)-H(p)=-\sum_{i=1}^np(i)\log {\dfrac{q(i)}{p(i)}}$,又称为KL散度,表征两个函数或概率分布的差异性,差异越大则相对熵越大.
最大似然估计、Negative Log Liklihood(NLL)、KL散度与Cross Entropy其实是等价的,都可以进行互相推导,当然MSE也可以用Cross Entropy进行推导出(详见Deep Learning Book P132)。
BCELoss Creates a criterion that measures the Binary Cross Entropy between the target and the output
用于二分类的损失函数,也就是 logistic 回归的损失函数。
对于二分类,我们只需要预测出正分类的概率 p,对应的 (1-p) 则是负分类的概率。其中 p 可使用 sigmoid 函数得到。
$$sigmoid(x) = \dfrac{1}{1+e^{(-x)}}$$
对应的损失函数可通过极大似然估计推导得到:
假设有 n 个独立的训练样本 ${(x_1,y_1), …,(x_n, y_n)}$
y 是真实标签,$y\in {0,1}$, 那么对于每一个样本的概率为:
$$P(x_i, y_i)=P(y_i=1|x_i)^{y_i}P(y_i=0|x_i)^{1-y_i}$$
$$=P(y_i=1|x_i)^{y_i}(1-P(y_i=1|x_i))^{1-y_i}$$
取负对数即可得:
$$-y_iP(y_i=1|x_i)-(1-y_i)(1-P(y_i=1|x_i))$$
不难看出,这与常见的 softmax 多分类的 loss 计算是一致的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 class BCELoss (_WeightedLoss ): def __init__ (self, weight=None , size_average=None , reduce=None , reduction='elementwise_mean' ): """ - weight: 手动调整权重,不太明白有啥用,用到在看吧 - size_average, reduce 弃用,直接看 reduction 即可 - reduction: "elementwise_mean"|"sum"|"none",看名字就知道啥意思了 """ super (BCELoss, self).__init__(weight, size_average, reduce, reduction) def forward (self, input , target ): return F.binary_cross_entropy(input , target, weight=self.weight, reduction=self.reduction) """ - input: 预测概率,任意 shape, 但是值必须在 0-1 之间 - target: 真实概率, shape 与 input 相同 """
$$loss(p,t)=−\dfrac{1}{N}\sum_{i=1}^{N}=\dfrac{1}{N}[t_i∗log(p_i)+(1−t_i)∗log(1−p_i)]$$
example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 import torchimport torch.nn as nnloss = nn.BCELoss(reduction="elementwise_mean" ) input = torch.randn(5 )target = torch.ones(5 ) loss = loss(torch.sigmoid(input ), target) my_loss = torch.mean(-target * torch.log(torch.sigmoid(input )) - (1 -target) * torch.log((1 -torch.sigmoid(input )))) loss1 = F.binary_cross_entropy(torch.sigmoid(input ), target, reduction="none" , weight=torch.Tensor([0 ,0 ,0 ,0 ,1 ])) loss2 = F.binary_cross_entropy(torch.sigmoid(input ), target, weight=torch.Tensor([0 ,0 ,0 ,0 ,1 ])) print (my_loss, loss)print (loss1, loss2*5 )
通常使用 sigmoid 函数时,我们预测得到正分类的概率,然后需要人为设置 threshold 来判断概率达到 threshold 才是正分类,有点类似于 hingle loss 哦。
torch.nn.CrossEntropyLoss This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.
多分类交叉熵损失函数,可以看作是 binary_cross_entropy 的拓展。计算过程可以分为两步,log_softmax() 和 nn.NLLloss()
It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.
在不均衡数据集中,参数 weight 会很有用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 class CrossEntropyLoss (_WeightedLoss ): def __init__ (): """ - weights: 给每一个类别一个权重。 - reduction: "elementwise_mean"|"sum"|"none". """ def forward (): """ - input: [batch, C] or [batch, C, d_1, d_2, ..., d_k] - target: [batch], 0 <= targte[i] <= C-1, or [batch, d_1, d_2, ..., d_k], K >= 2. """
example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 input = torch.randn(2 , 3 )target = torch.Tensor([0 , 2 ]).long() loss_fn = nn.CrossEntropyLoss() loss = loss_fn(input , target) score = torch.log_softmax(input , dim=1 ) score1 = torch.log(F.softmax(input , dim=1 )) print (score)print (score1)nll_loss_fn = nn.NLLLoss() nll_loss = nll_loss_fn(score, target) my_nll = torch.mean(-score[0 ][0 ] - score[1 ][2 ]) print (nll_loss, loss, my_nll)
1 2 3 4 5 6 7 8 9 10 11 tensor([[-0.8413, -0.7365, -2.4073], [-0.4626, -2.0660, -1.4120]]) tensor([[-0.8413, -0.7365, -2.4073], [-0.4626, -2.0660, -1.4120]]) tensor(1.1266) tensor(1.1266) tensor(1.1266)
torch.nn.NLLloss The negative log likelihood loss. It is useful to train a classification problem with C class.
input 是已经通过 log_softmax 层的输入。loss 是对应样本中真实标签对应的值的负数。
1 2 3 4 5 6 7 8 9 10 11 12 13 class NLLLoss (_WeightedLoss ): def __init__ (): """ 参数设置跟 CrossEntropyLoss 基本一致。 """
NLLloss
$$\ell(x, y) = L = {l_1,\dots,l_N}^\top, \quad
l_n = - w_{y_n} x_{n,y_n}, \quad
w_{c} = \text{weight}[c] \cdot \mathbb{1}{c \not= \text{ignore_index}}$$
example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 loss = nn.NLLLoss() input = torch.randn(3 , 5 , requires_grad=True )target = torch.tensor([1 , 0 , 4 ]) output = loss(torch.log_softmax(input , dim=1 ), target) score = torch.log_softmax(input , dim=1 ) output2 = (-score[0 , 1 ]-score[1 , 0 ]-score[2 , 4 ])/3 output.backward() print (output, output2)
MultiMarginLoss $loss = \dfrac{1}{N}\sum_{j\ne y_i}^{N}max(0,s_j - s_{y_i}+\Delta)$
$s_{yi}$ 表示其真实标签对应的值,那么其他非真实分类的结果凡是大于 $s_{yi}−\Delta$ 这个值的,都对最后的结果 $loss$ 产生影响,比这个值小的就没事~
显然想对于 softmax 损失函数来说,softmax 考虑到了所有的错分类,而 marginloss 只考虑概率较大的错分类。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 class MultiMarginLoss (_WeightedLoss ): def __init__ (self, p=1 , margin=1 , weight=None , size_average=None , reduce=None , reduction='elementwise_mean' ): """ - p (int, optional): Has a default value of `1`. `1` and `2` are the only supported values - margin (float, optional): Has a default value of `1`. """ super (MultiMarginLoss, self).__init__(weight, size_average, reduce, reduction) if p != 1 and p != 2 : raise ValueError("only p == 1 and p == 2 supported" ) assert weight is None or weight.dim() == 1 self.p = p self.margin = margin def forward (self, input , target ): return F.multi_margin_loss(input , target, p=self.p, margin=self.margin, weight=self.weight, reduction=self.reduction)
example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 loss = nn.MultiMarginLoss() input = torch.FloatTensor([[0 , 3 , 1 ], [0 , 4 , 2 ], [1 , 5 , 2 ], [3 , 5 , 1 ]])target = torch.ones(4 ).long() out = loss(input , target) print (out)
nn.L1loss $$L1(\hat{y}, y)=\dfrac{1}{m}\sum|\hat{y}_i−y_i|$$
nn.MSEloss $$L2(\hat{y}, y)=\dfrac{1}{m}\sum|\hat{y}_i−y_i|^2$$
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 loss = nn.L1Loss() loss2 = nn.MSELoss() input = torch.FloatTensor([1 ,2 ,3 ])target = torch.FloatTensor([1 ,2 ,9 ]) output = loss(input , target) output2 = loss2(input , target) print (output, output2)