Show-1https://showlab.github.io/Show-1/
Show-1 is a hybrid model for text-to-video generation that combines pixel-based and latent-based VDMs. It addresses the limitations of relying solely on either approach. Show-1 first uses pixel-based VDMs to generate a low-resolution video with accurate text-video alignment. It then employs latent-based VDMs to upsample the low-resolution video to high resolution using a novel expert translation method. This combination allows Show-1 to produce high-quality videos with precise text-video alignment while being more efficient in terms of GPU memory usage compared to pixel-based VDMs.