Deep Generative Models



论文笔记-Autoencoders Are Vision Learners

  • DALL-E: Zero-Shot Text-to-Image Generation
  • BEIT: BERT Pre-Training of Image Transformers
  • Discrete representations strengthen vision transformer robustness
  • IBOT: Image BERT Pre-training with online tokenizer
  • Masked Autoencoders Are Scalable Vision Learners
  • VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
  • SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition

论文笔记-Denoising Diffusion Probabilistic Models


论文笔记-fast transformer

  • Long range arena: A benchmark for efficient transformers

  • Convolution and Transformer

    • Swin transformer: Hierarchical vision transformer using shifted windows.
    • Multi-scale vision longformer: A new vision transformer for high-resolution image encoding
    • Incorporating convolution designs into visual transformers
    • On the relationship between self-attention and convolutional layers
  • local windows

    • Image transformer
    • Blockwise self-attention for long document understanding
  • Axial pattern

    • attention in multidimensional transformers.
  • Adaptive span

    • Adaptive attention span in transformers.
  • approximation

    • Linformer: Self-attention with linear complexity
    • Rethinking attention with performers.
    • Linear Transformer: Transformers are RNNs: Fast autoregressive transformers with linear attention.
    • Efficient attention: Attention with linear complexities
    • Nyströmformer: A nyström-based algorithm for approximating self-attention.
    • Fnet: Mixing tokens with fourier transforms.
    • XCiT: Cross-Covariance Image Transformers
    • Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
    • Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel.

论文笔记-video generation

  • GenHFi: Generating high fidelity images with subscale pixel networks and multidimensional upscaling. (ICLR2019)
  • paper2: Scaling autoregressive video models (ICLR2020)
  • Video pixel networks. (CoRR2016)
  • Parallel: Parallel multiscale autoregressive density estimation
  • VQGAN: Taming Transformers for High-Resolution Image Synthesis
  • TeCoGAN: Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation
  • ImaGINator: Conditional Spatio-Temporal GAN for Video Generation
  • Temporal Shift GAN for Large Scale Video Generation
  • MoCoGAN: Decomposing Motion and Content for Video Generation
  • Playable Video Generation (CVPR2021)