pytorch-损失函数

pytorch loss function.

Cross Entropy

简单来说,交叉熵是用来衡量在给定的真实分布 $p_k$ 下,使用非真实分布 $q_k$ 所指定的策略 f(x) 消除系统的不确定性所需要付出的努力的大小。交叉熵的越低说明这个策略越好,我们总是 minimize 交叉熵,因为交叉熵越小,就证明算法所产生的策略越接近最优策略,也就间接证明我们的算法所计算出的非真实分布越接近真实分布。交叉熵损失函数从信息论的角度来说,其实来自于 KL 散度,只不过最后推导的新式等价于交叉熵的计算公式:

从信息论的视角来理解: 信息量/信息熵(熵)/交叉熵/条件熵

信息量: 一个事件的信息量就是这个时间发生的概率的负对数,概率越大,所带来的信息就越少嘛。至于为什么是负对数,就要问香农了。。起码要满足$P(X)=1$时信息量为0,且始终大于0

$$-\log P(X)$$

信息熵, 也就是熵,是随机变量不确定性的度量,依赖于事件X的概率分布。即信息熵是信息量的期望。即求离散分布列的期望~~

$$H(p) = -\sum_{i=1}^np_i\log p_i$$

交叉熵: 回归到分类问题来,我们通过score function得到一个结果(10,1),通过softmax函数压缩成0到1的概率分布,我们称为 $q_i=\dfrac{e^{f_{y_i}}}{\sum_je^{f_j}}$ 吧,

$$H(p,q) = -\sum_{i=1}^np_i\log q_i$$

这就是我们所说的交叉熵,通过 Gibbs’ inequality 知道:$H(p,q)>=H(p)$ 恒成立,当且仅当 $q_i$ 分布和 $p_i$ 相同时,两者相等。

相对熵: 跟交叉熵是同样的概念,$D(p||q)=H(p,q)-H(p)=-\sum_{i=1}^np(i)\log {\dfrac{q(i)}{p(i)}}$,又称为KL散度,表征两个函数或概率分布的差异性,差异越大则相对熵越大.

最大似然估计、Negative Log Liklihood(NLL)、KL散度与Cross Entropy其实是等价的,都可以进行互相推导,当然MSE也可以用Cross Entropy进行推导出(详见Deep Learning Book P132)。

BCELoss

Creates a criterion that measures the Binary Cross Entropy between the target and the output

用于二分类的损失函数,也就是 logistic 回归的损失函数。

对于二分类,我们只需要预测出正分类的概率 p,对应的 (1-p) 则是负分类的概率。其中 p 可使用 sigmoid 函数得到。

$$sigmoid(x) = \dfrac{1}{1+e^{(-x)}}$$

对应的损失函数可通过极大似然估计推导得到:

假设有 n 个独立的训练样本 ${(x_1,y_1), …,(x_n, y_n)}$

y 是真实标签,$y\in {0,1}$, 那么对于每一个样本的概率为:

$$P(x_i, y_i)=P(y_i=1|x_i)^{y_i}P(y_i=0|x_i)^{1-y_i}$$

$$=P(y_i=1|x_i)^{y_i}(1-P(y_i=1|x_i))^{1-y_i}$$

取负对数即可得:

$$-y_iP(y_i=1|x_i)-(1-y_i)(1-P(y_i=1|x_i))$$

不难看出,这与常见的 softmax 多分类的 loss 计算是一致的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

class BCELoss(_WeightedLoss):

def __init__(self, weight=None, size_average=None, reduce=None, reduction='elementwise_mean'):

"""

- weight: 手动调整权重,不太明白有啥用,用到在看吧

- size_average, reduce 弃用,直接看 reduction 即可

- reduction: "elementwise_mean"|"sum"|"none",看名字就知道啥意思了

"""

super(BCELoss, self).__init__(weight, size_average, reduce, reduction)



def forward(self, input, target):

return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)

"""

- input: 预测概率,任意 shape, 但是值必须在 0-1 之间

- target: 真实概率, shape 与 input 相同

"""

$$loss(p,t)=−\dfrac{1}{N}\sum_{i=1}^{N}=\dfrac{1}{N}[t_i∗log(p_i)+(1−t_i)∗log(1−p_i)]$$

example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

import torch

import torch.nn as nn



loss = nn.BCELoss(reduction="elementwise_mean")

input = torch.randn(5)

target = torch.ones(5)





loss = loss(torch.sigmoid(input), target)



my_loss = torch.mean(-target * torch.log(torch.sigmoid(input)) - (1-target) * torch.log((1-torch.sigmoid(input))))



# test weight parameter

loss1 = F.binary_cross_entropy(torch.sigmoid(input), target, reduction="none", weight=torch.Tensor([0,0,0,0,1]))

loss2 = F.binary_cross_entropy(torch.sigmoid(input), target, weight=torch.Tensor([0,0,0,0,1]))

print(my_loss, loss)



print(loss1, loss2*5)



# tensor(0.7590) tensor(0.7590)

# tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.3104]) tensor(0.3104)

通常使用 sigmoid 函数时,我们预测得到正分类的概率,然后需要人为设置 threshold 来判断概率达到 threshold 才是正分类,有点类似于 hingle loss 哦。

torch.nn.CrossEntropyLoss

This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

多分类交叉熵损失函数,可以看作是 binary_cross_entropy 的拓展。计算过程可以分为两步,log_softmax() 和 nn.NLLloss()

It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

在不均衡数据集中,参数 weight 会很有用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

class CrossEntropyLoss(_WeightedLoss):

def __init__():

"""

- weights: 给每一个类别一个权重。

- reduction: "elementwise_mean"|"sum"|"none".

"""

def forward():

"""

- input: [batch, C] or [batch, C, d_1, d_2, ..., d_k]

- target: [batch], 0 <= targte[i] <= C-1, or [batch, d_1, d_2, ..., d_k], K >= 2.

"""

example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

input = torch.randn(2, 3)

target = torch.Tensor([0, 2]).long()



# use loss function

loss_fn = nn.CrossEntropyLoss()

loss = loss_fn(input, target)



# compute loss step by step

score = torch.log_softmax(input, dim=1)

score1 = torch.log(F.softmax(input, dim=1))

print(score)

print(score1)



# use nll loss

nll_loss_fn = nn.NLLLoss()

nll_loss = nll_loss_fn(score, target)



# computer nll loss step by step

my_nll = torch.mean(-score[0][0] - score[1][2])

print(nll_loss, loss, my_nll)



1
2
3
4
5
6
7
8
9
10
11

tensor([[-0.8413, -0.7365, -2.4073],

[-0.4626, -2.0660, -1.4120]])

tensor([[-0.8413, -0.7365, -2.4073],

[-0.4626, -2.0660, -1.4120]])

tensor(1.1266) tensor(1.1266) tensor(1.1266)

torch.nn.NLLloss

The negative log likelihood loss. It is useful to train a classification problem with C class.

input 是已经通过 log_softmax 层的输入。loss 是对应样本中真实标签对应的值的负数。

1
2
3
4
5
6
7
8
9
10
11
12
13

class NLLLoss(_WeightedLoss):

def __init__():

"""

参数设置跟 CrossEntropyLoss 基本一致。

"""



NLLloss

$$\ell(x, y) = L = {l_1,\dots,l_N}^\top, \quad

l_n = - w_{y_n} x_{n,y_n}, \quad

w_{c} = \text{weight}[c] \cdot \mathbb{1}{c \not= \text{ignore_index}}$$

example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

loss = nn.NLLLoss()

# input is of size N x C = 3 x 5

input = torch.randn(3, 5, requires_grad=True)

# each element in target has to have 0 <= value < C

target = torch.tensor([1, 0, 4])

output = loss(torch.log_softmax(input, dim=1), target)



score = torch.log_softmax(input, dim=1)

output2 = (-score[0, 1]-score[1, 0]-score[2, 4])/3

output.backward()

# output2.backward()



print(output, output2)



# tensor(1.5658, grad_fn=<NllLossBackward>) tensor(1.5658, grad_fn=<DivBackward0>)

MultiMarginLoss

$loss = \dfrac{1}{N}\sum_{j\ne y_i}^{N}max(0,s_j - s_{y_i}+\Delta)$

$s_{yi}$ 表示其真实标签对应的值,那么其他非真实分类的结果凡是大于 $s_{yi}−\Delta$ 这个值的,都对最后的结果 $loss$ 产生影响,比这个值小的就没事~

显然想对于 softmax 损失函数来说,softmax 考虑到了所有的错分类,而 marginloss 只考虑概率较大的错分类。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

class MultiMarginLoss(_WeightedLoss):

def __init__(self, p=1, margin=1, weight=None, size_average=None, reduce=None, reduction='elementwise_mean'):

"""

- p (int, optional): Has a default value of `1`. `1` and `2` are the only supported values

- margin (float, optional): Has a default value of `1`.

"""

super(MultiMarginLoss, self).__init__(weight, size_average, reduce, reduction)

if p != 1 and p != 2:

raise ValueError("only p == 1 and p == 2 supported")

assert weight is None or weight.dim() == 1

self.p = p

self.margin = margin



def forward(self, input, target):

return F.multi_margin_loss(input, target, p=self.p, margin=self.margin,

weight=self.weight, reduction=self.reduction)



example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

loss = nn.MultiMarginLoss()

input = torch.FloatTensor([[0, 3, 1], [0, 4, 2], [1, 5, 2], [3, 5, 1]])

target = torch.ones(4).long()



out = loss(input, target)



print(out) # 显然应该是 0,因为负分类与真实标签的 socre 差值都大于等于 1.

# tensor(0.)

nn.L1loss

$$L1(\hat{y}, y)=\dfrac{1}{m}\sum|\hat{y}_i−y_i|$$

nn.MSEloss

$$L2(\hat{y}, y)=\dfrac{1}{m}\sum|\hat{y}_i−y_i|^2$$

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

loss = nn.L1Loss()

loss2 = nn.MSELoss()



input = torch.FloatTensor([1,2,3])

target = torch.FloatTensor([1,2,9])



output = loss(input, target)

output2 = loss2(input, target)



print(output, output2)

# tensor(2.) tensor(12.)

作者

Xie Pan

发布于

2018-12-07

更新于

2021-06-29

许可协议

评论