Long range arena: A benchmark for efficient transformers
Convolution and Transformer
- Swin transformer: Hierarchical vision transformer using shifted windows.
- Multi-scale vision longformer: A new vision transformer for high-resolution image encoding
- Incorporating convolution designs into visual transformers
- On the relationship between self-attention and convolutional layers
- Image transformer
- Blockwise self-attention for long document understanding
- attention in multidimensional transformers.
- Adaptive attention span in transformers.
- Linformer: Self-attention with linear complexity
- Rethinking attention with performers.
- Linear Transformer: Transformers are RNNs: Fast autoregressive transformers with linear attention.
- Efficient attention: Attention with linear complexities
- Nyströmformer: A nyström-based algorithm for approximating self-attention.
- Fnet: Mixing tokens with fourier transforms.
- XCiT: Cross-Covariance Image Transformers
- Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
- Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel.