Deep Generative Models



论文笔记-Autoencoders Are Vision Learners

  • DALL-E: Zero-Shot Text-to-Image Generation
  • BEIT: BERT Pre-Training of Image Transformers
  • Discrete representations strengthen vision transformer robustness
  • IBOT: Image BERT Pre-training with online tokenizer
  • Masked Autoencoders Are Scalable Vision Learners
  • VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
  • SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition

论文笔记-video generation

  • GenHFi: Generating high fidelity images with subscale pixel networks and multidimensional upscaling. (ICLR2019)
  • paper2: Scaling autoregressive video models (ICLR2020)
  • Video pixel networks. (CoRR2016)
  • Parallel: Parallel multiscale autoregressive density estimation
  • VQGAN: Taming Transformers for High-Resolution Image Synthesis
  • TeCoGAN: Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation
  • ImaGINator: Conditional Spatio-Temporal GAN for Video Generation
  • Temporal Shift GAN for Large Scale Video Generation
  • MoCoGAN: Decomposing Motion and Content for Video Generation
  • Playable Video Generation (CVPR2021)

论文笔记-Discrete Latent Variables Based Generation

  • VQ-VAE: Neural Discrete Representation Learning (NIPS2017)
  • VQ-VAE2: Generating Diverse High-Resolution Images with VQ-VAE-2
  • DALL-E: Zero-Shot Text-to-Image Generation
  • VideoGPT: Video Generation using VQ-VAE and Transformers
  • LVT: Latent Video Transformer
  • Feature Quantization Improves GAN Training (ICML2020)
  • DVT-NAT: Fast Decoding in Sequence Models Using Discrete Latent Variables (ICML2018)
  • NWT: Towards natural audio-to-video generation with representation learning
  • NUWA: Visual Synthesis Pre-training for Neural visUal World creAtion