论文笔记-dynamic convolution and involution

paper list:

  • CARAFE: Content-Aware ReAssembly of FEatures
  • Involution: Inverting the Inherence of Convolution for Visual Recognition
  • Pay less attention with lightweight and dynamic convolutions
  • ConvBERT: Improving BERT with Span-based Dynamic Convolution
  • Dynamic Region-Aware Convolution

Involution

为什么要叫反卷积呢?

卷积的特性是:

  • space-agnostic:空间不变性,也就是用一个kernel在feature map上滑动。这样学的到feature是单一的为了增加feature的丰富性,采用很大的channel
  • channel-specific: 通道特异性。尽管channel增加能学到更多特征,但是通道太大其实是有冗余的,有人做低秩实验,发现很多channel对应的参数是线性相关的

于是,作者设计了一个与卷积完全相反的算子,反卷积:

  • space-specific: 根据content生成相应的weights
  • channel-agnostic: 共享参数。类似attention的projection和feedforward.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def forward(self, x):
# 1. 生成pixel-wise对应的权重,每个pixel对应的权重是 [kernel_size^2*group].
# print("after avgpool: ", self.avgpool(x).shape)
# print("after conv1: ", self.conv1(x if self.stride == 1 else self.avgpool(x)).shape)
weight = self.conv2(self.conv1(x if self.stride == 1 else self.avgpool(x)))
# print("weight: ", weight.shape)
b, c, h, w = weight.shape
weight = weight.view(b, self.groups, self.kernel_size**2, h, w).unsqueeze(2)
print("weight: ", weight.shape)

# 2. 将x通过unfold以kernel_size为大小,stride为步长i女性展开
print("after unfold: ", self.unfold(x).shape) # [bs, channel*kernel*2, ((h-kernel+1+2*pad)/stride))^2]
out = self.unfold(x).view(b, self.groups, self.group_channels, self.kernel_size**2, h, w)
print("out: ", out.shape)
out = (weight * out).sum(dim=3).view(b, self.channels, h, w)
return out
  • 思路很像local attention, 区别在于这个weight是通过content+linear得到的,而不是通过pixel之间的relation得到的。而且,看源代码这个weights并没有做normalization.

  • 然后把生成的weights与对应的pixel周围的[kernel,kernel]的pixels进行加权求和。

作者

Xie Pan

发布于

2021-04-29

更新于

2021-07-06

许可协议

评论