论文笔记-video generation

  • paper1: Generating high fidelity images with subscale pixel networks and multidimensional upscaling. (ICLR2019)
  • paper2: Scaling autoregressive video models (ICLR2020)
  • Video pixel networks. (CoRR2016)
  • Parallel multiscale autoregressive density estimation

Paper1

title: Generating high fidelity images with subscale pixel networks and multidimensional upscaling

Abstract:

  • Subscale Pixel Network (SPN): a conditional decoder architecture that generates an image as a sequence of sub-images of equal size
  • Multidimensional Upscaling: grow an image in both size and depth via intermediate stages utilising distinct SPNs

Introduction

  • The multi-facted relationship between MLE scores and the fidelity of samples
    • MLE is a well-defined measure as improvements in held-out scores generally produce improvements in the visual fidelity of the samples.
    • MLE forces the model to support the entire empirical distribution. This guarantees the model’s ability to generalize at the cost of allotting capacity to parts of the distribution that are irrelevant to fidelity.
  • A 256 × 256 × 3 image has a total of 196,608 positions that need to be architecturally connected in order to learn dependencies among them.

Contribution

  • Multidimensional Upscaling

    • Small size, lower depth -> large size, lower depth -> large size, high depth
  • Subscale Pixel Network (SPN) architecture

    • divides an image of size $N\times N$ into sub-images of size $\dfrac{N}{S}\times \dfrac{N}{S}$ sliced out at interleaving positions

    • SPN consists of two networks, a conditioning network that embeds previous slices and a decoder proper that predicts a single target slice given the context embedding.

Architecture

作者

Xie Pan

发布于

2021-09-17

更新于

2021-09-17

许可协议

评论