论文笔记-Discrete Denoising Diffusion Model

  • Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models.

  • Structured Denoising Diffusion Models in Discrete State-Spaces.

  • Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes 2111.12701.pdf (arxiv.org)

  • Vector Quantized Diffusion Model for Text-to-Image Synthesis, CVPR2022

阅读更多

论文笔记-Denoising Diffusion Probabilistic Models

  • Deep Unsupervised Learning using Nonequilibrium Thermodynamics.

  • DDPM: Denoising Diffusion Probabilistic Models

  • Diffusion models beat gans on image synthesis.

  • Image super-resolution via iterative refinement

  • Cascaded diffusion models for high fidelity image generation

  • Score-based generative modeling in latent space.

  • Discrete diffusion model

    • Structured Denoising Diffusion Models in Discrete State-Spaces.
    • Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models.
    • Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes 2111.12701.pdf (arxiv.org)
  • parallel generation

    • Generative modeling by estimating gradients of the data distribution
    • Score-based generative modeling through stochastic differential equations.
    • Denoising diffusion implicit models.
阅读更多

Deep Generative Models

Catalog:

阅读更多

论文笔记-Autoencoders Are Vision Learners

  • DALL-E: Zero-Shot Text-to-Image Generation
  • BEIT: BERT Pre-Training of Image Transformers
  • Discrete representations strengthen vision transformer robustness
  • IBOT: Image BERT Pre-training with online tokenizer
  • Masked Autoencoders Are Scalable Vision Learners
  • VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
  • SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
阅读更多

论文笔记-fast transformer

  • Long range arena: A benchmark for efficient transformers

  • Convolution and Transformer

    • Swin transformer: Hierarchical vision transformer using shifted windows.
    • Multi-scale vision longformer: A new vision transformer for high-resolution image encoding
    • Incorporating convolution designs into visual transformers
    • On the relationship between self-attention and convolutional layers
  • local windows

    • Image transformer
    • Blockwise self-attention for long document understanding
  • Axial pattern

    • attention in multidimensional transformers.
  • Adaptive span

    • Adaptive attention span in transformers.
  • approximation

    • Linformer: Self-attention with linear complexity
    • Rethinking attention with performers.
    • Linear Transformer: Transformers are RNNs: Fast autoregressive transformers with linear attention.
    • Efficient attention: Attention with linear complexities
    • Nyströmformer: A nyström-based algorithm for approximating self-attention.
    • Fnet: Mixing tokens with fourier transforms.
    • XCiT: Cross-Covariance Image Transformers
    • Scatterbrain: Unifying Sparse and Low-rank Attention Approximation
    • Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel.
阅读更多