Show-1 is a hybrid model for text-to-video generation that combines pixel-based and latent-based VDMs. It addresses the limitations of relying solely on either approach. Show-1 first uses pixel-based VDMs to generate a low-resolution video with accurate text-video alignment. It then employs latent-based VDMs to upsample the low-resolution video to high resolution using a novel expert translation method. This combination allows Show-1 to produce high-quality videos with precise text-video alignment while being more efficient in terms of GPU memory usage compared to pixel-based VDMs.
TokenFlow is a framework for text-driven video editing using a text-to-image diffusion model. The framework aims to generate high-quality videos that adhere to a target text while preserving the spatial layout and dynamics of the input video. The method leverages the observation that consistency in the edited video can be achieved by enforcing consistency in the diffusion feature space. This is done by propagating edited features across frames based on inter-frame correspondences. The framework does not require training or fine-tuning and can be used with any text-to-image editing method. Experimental results demonstrate state-of-the-art video editing on real-world videos. The method involves inverting input video frames, extracting tokens, and using nearest-neighbor search for feature correspondences. Denoising is performed by replacing generated tokens with propagated tokens from the original video.
CoDeF is a new video representation involving a canonical content field and a temporal deformation field. The canonical content field captures the static contents of the entire video, while the temporal deformation field records the transformations from the canonical image to each frame. By optimizing these fields together, CoDeF can reconstruct a target video and support the application of image algorithms to the canonical image, which can then be propagated to the entire video using the temporal deformation field. This allows for video-to-video translation and keypoint tracking without training, resulting in improved cross-frame consistency compared to existing methods. CoDeF can also be applied to other tasks like point-based tracking, segmentation-based tracking, and video super-resolution.
This AUTOMATIC1111 UI extension allows users to create edited videos using ebsynth without requiring AE. The extension is confirmed to work properly with Controlnet installed. Users must install ffmpeg and Ebsynth before installing the extension from the Extensions tab of the webui. To use the extension, users must go to the Ebsynth Utility tab and create an empty directory for the project directory. The original video to be edited should be placed in this directory, and users should select stage 1 and generate. The process should be executed in order from stage 1 to 7, and progress is not reflected in the webui, so users should check the console screen. In the latest webui, dropping the image on the main screen of img2img is required to avoid errors.
Deforum is a community of AI image synthesis developers, enthusiasts, and artists. They have created a notebook using Stable Diffusion and continue to improve its functionality daily. It's free, it's amazing, and you can use it to make cool stuff with AI.
FoldFold allExpandExpand allAre you sure you want to delete this link?Are you sure you want to delete this tag?
The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community