论文笔记-fast transformer

  • Long range arena: A benchmark for efficient transformers

  • Convolution and Transformer

    • Swin transformer: Hierarchical vision transformer using shifted windows.
    • Multi-scale vision longformer: A new vision transformer for high-resolution image encoding
    • Incorporating convolution designs into visual transformers
    • On the relationship between self-attention and convolutional layers
  • local windows

    • Image transformer
    • Blockwise self-attention for long document understanding
  • Axial pattern

    • attention in multidimensional transformers.
  • Adaptive span

    • Adaptive attention span in transformers.
  • approximation

    • Linformer: Self-attention with linear complexity
    • Rethinking attention with performers.
    • Linear Transformer: Transformers are RNNs: Fast autoregressive transformers with linear attention.
    • Efficient attention: Attention with linear complexities
    • Nyströmformer: A nyström-based algorithm for approximating self-attention.
    • Fnet: Mixing tokens with fourier transforms.
    • XCiT: Cross-Covariance Image Transformers
    • Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
    • Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel.
作者

Xie Pan

发布于

2021-11-19

更新于

2021-11-26

许可协议

评论