Kosmos-G is a model that leverages Multimodal Large Language Models (MLLMs) to generate images in context from generalized vision-language inputs. It aligns the output space of the MLLM with CLIP using the textual modality as an anchor. Kosmos-G's unique capability includes zero-shot multi-entity subject-driven generation and seamless integration with various U-Net techniques. The model consists of an MLLM for multimodal perception and an AlignerNet that connects the MLLM to the image decoder. The training pipeline involves pre-training the MLLM, aligning the image decoder, and fine-tuning through instruction tuning. Kosmos-G demonstrates its effectiveness in diverse image generation tasks and offers potential applications with customized image decoder variants.
Fondant is an open-source framework designed to simplify and accelerate large-scale data processing. It allows for the reuse of containerized components across pipelines and execution environments, enabling sharing within the community. Fondant provides plug-and-play pipelines for various tasks such as AI image generation model fine-tuning, large language model fine-tuning, and code generation model fine-tuning. It also offers a library of reusable components for tasks such as data extraction, filtering, removal of unwanted content, data transformation, data tuning, and data enrichment. The framework supports multimodal capabilities, standardized Python/Pandas-based custom component creation, and production-ready scalable deployment. It also integrates with multiple cloud platforms. Fondant's main goal is to give users control over their data and simplify the building of pipelines for large-scale data processing. The text provides information on how to get started with Fondant and showcases example pipelines for tasks like filtering a creative commons image dataset and fine-tuning models like ControlNet and Stable Diffusion.
Stable Diffusion Sketch is an Android app that allows users to create colorful sketches and enhance them using various modes of Stable Diffusion. Users can create new paintings using blank canvas, camera capture, Stable Diffusion txt2img, and images shared from other apps. The app also offers preset modes, custom modes, painting tools, undo/redo functionality, and options to adjust prompt prefix, postfix, and negative prompt values. There are three canvas aspect ratios available: landscape, portrait, and square. Users can also enlarge the image using the upscaler feature or delete them from the main screen. Grouping related sketches is also possible, and the app supports multiple ControlNet.
A comprehensive guide to ControlNet v1.1, a Stable Diffusion model that allows users to control image compositions and human poses based on reference images. The guide covers various aspects of ControlNet, including its installation on different platforms (Windows, Mac, Google Colab), settings, and common use cases.
The SD-CN-Animation project offers automated video stylization and text-to-video generation using StableDiffusion and ControlNet. It provides the ability to stylize videos automatically and generate new videos from text input, using various Stable Diffusion models as backbones. The project incorporates the 'RAFT' optical flow estimation algorithm to maintain animation stability and generate occlusion masks for frame generation. In text-to-video mode, it utilizes the 'FloweR' method for predicting optical flow from previous frames. The ControlNet model is recommended for better results in vid2vid mode.
UnpromptedControl is a tool used for guiding StableDiffusion models in image restoration and object removal tasks. By leveraging a simple hack, it allows for the restoration or removal of objects without requiring user prompts, leading to enhanced process efficiency. The tool uses ControlNet and StableDiffusionInpaintPipeline models to guide the inpainting process and restore the image to a more natural-looking state. However, the algorithm currently has limitations in processing images of people's faces and bodies.
ComfyUI is a powerful and modular stable diffusion GUI and backend that enables users to design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart-based interface. This software supports SD1.x and SD2.x, and provides an asynchronous queue system with many optimizations. It can be used to load ckpt, safetensors, and diffusers models/checkpoints, as well as standalone VAEs and CLIP models. Other features include embeddings/textual inversion, area composition, inpainting with both regular and inpainting models, ControlNet and T2I-Adapter, upscale models, unCLIP models, and more. ComfyUI starts up quickly and works fully offline without downloading anything. Users can also save and load workflows as Json files, and the nodes interface can be used to create complex workflows.
This dataset is designed to train a ControlNet with human facial expressions. It includes keypoints for pupils to allow gaze direction. Training has been tested on Stable Diffusion v2.1 base (512) and Stable Diffusion v1.5.
ControlNet, a neural network structure that adds extra conditions to diffusion models to control them. ControlNet copies the weights of neural network blocks into a "locked" copy and a "trainable" copy, allowing the "trainable" copy to learn the condition while preserving the production-ready diffusion models. The "zero convolution" is used to prevent distortion during training, and no layer is trained from scratch, making it safe to use with small-scale or personal devices. The text also explains how ControlNet can be used with Stable Diffusion to reuse the SD encoder as a powerful backbone for learning diverse controls. The efficacy of the SD encoder as a backbone is validated through various evidences.
Tips for using ControlNet include adding negative prompts and ignoring canvas height/width. The addon have also T2I-Adapter support and experimental features such as CFG-based ControlNet, Guess Mode, and Multi-ControlNet/Joint Conditioning. The weight and guidance strength/start/end are factors that affect the influence of ControlNet on the original SD Unet.
This AUTOMATIC1111 UI extension allows users to create edited videos using ebsynth without requiring AE. The extension is confirmed to work properly with Controlnet installed. Users must install ffmpeg and Ebsynth before installing the extension from the Extensions tab of the webui. To use the extension, users must go to the Ebsynth Utility tab and create an empty directory for the project directory. The original video to be edited should be placed in this directory, and users should select stage 1 and generate. The process should be executed in order from stage 1 to 7, and progress is not reflected in the webui, so users should check the console screen. In the latest webui, dropping the image on the main screen of img2img is required to avoid errors.
FoldFold allExpandExpand allAre you sure you want to delete this link?Are you sure you want to delete this tag?
The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community