论文笔记-video transformer

paper list:

  • Training data-efficient image transformers & distillation through attention.
  • An image is worth 16x16 words: Transformers for image recognition at scale.
  • ViViT: A Video Vision Transformer.
  • Is space-time attention all you need for video understanding
  • Video transformer network.
  • Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
  • CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
  • What Makes for Hierarchical Vision Transformer?
  • WiderNet Go Wider Instead of Deeper
  • CoAtNet: Marrying Convolution and Attention for All Data Sizes
阅读更多