Stability AI has released an open-source version of its DreamStudio text-to-image consumer application called StableStudio. The company intends to work with the broader community to create a world class user interface for generative AI that users fully control. DreamStudio was first conceived as an animation studio that shifted its focus to image generation with the arrival of Stable Diffusion in the summer of 2022. StableStudio enable local-first development through WebGPU and a desktop installation of its Stable Diffusion tool. It is also compatible with ControlNet tools and local inference through AUTOMATIC1111 stable-diffusion-webui tool.
aiNodes is a Python-based AI image/motion picture generator node engine that facilitates creativity in the creation of images and videos. The engine is fully modular and can download node packs on runtime. It also features RIFE and FILM interpolation integration, coloured background drop, and node-creation with IDE annotations. The installation process requires Python 3.10, Git, and an nVidia GPU with CUDA and drivers installed. AiNodes engine is an open-source desktop AI-based image/motion generator that supports various features such as Deforum, Stable Diffusion, Upscalers, Kandinsky, ControlNet, LORAs, Ti Embeddings, Hypernetworks, Background Separation, Human matting/masking, and Compositing, among others.
ProFusion is a new framework for customized text-to-image generation that preserves fine-grained image details without using regularization, as proposed in the paper. ProFusion includes PromptNet, an encoder network, and Fusion Sampling, a method that generates customized images based on a single user-provided image and text requirements. The paper explains how ProFusion works and provides experiments demonstrating its superior performance compared to existing approaches, while still meeting additional user-defined requirements.
We propose the Asymmetric VQGAN, to preserve the information of conditional image input. Asymmetric VQGAN involves two core designs compared with the original VQGAN as shown in the figure. First, we introduce a conditional branch into the decoder of the VQGAN which aims to handle the conditional input for image manipulation tasks. Second, we design a larger decoder for VQGAN to better recover the losing details of the quantized codes.
StyleDrop is a technology that generates images in any desired style using text-to-image transformer, Muse. The technology captures nuances of user-provided styles such as design patterns and colour schemes. StyleDrop works by fine-tuning a few trainable parameters and improves the quality of generated images via iterative training with human or automated feedback. The technology can generate high-quality images from text prompts, and style descriptors are added during training and synthesis to improve the results. StyleDrop is easy to use and can be trained with brand assets. It can be used to generate alphabets with consistent styles in a single reference image. StyleDrop on Muse outperforms other methods in style-tuning for text-to-image models.
The Diffusion Explainer tool is an interactive webpage that allows users to generate an image from a text prompt. Users have control over various hyperparameters, including the seed and guidance scale, to customize the generated image. The text prompt should describe the desired image in detail to generate high-quality images. By changing the random seed, users can obtain different image representations. Moreover, adjusting the guidance scale can improve the adherence of the image to the text prompt but could limit the image's creativity. While the tool offers flexibility in creating images, it does not allow adjustments to other hyperparameters such as the total number of timesteps, image size, and the type of scheduler.
Mist is a powerful image preprocessing tool developed to protect images from being mimicked by AI-for-Art applications. It adds watermarks to images, making them unrecognizable and inimitable by state-of-the-art AI models. Mist has been open-sourced on GitHub and aims to create a collaborative community for developers and users to improve its performance. The advantages of Mist include robustness against noise purification methods and time efficiency, with the watermarking process taking only a few minutes. The tool is effective against various AI-for-Art applications such as textual inversion, dreambooth, scenario.gg, and NovelAI image2image. Mist is also robust to image transformations like cropping and resizing. The watermarking process is fast, with an image being processed within 3 minutes using default parameters.
BLIP-Diffusion is a new model for generating and editing images based on text prompts and subject images. Unlike previous models, it uses a pre-trained multimodal encoder to represent the subject, allowing for efficient fine-tuning and better preservation of subject details. The model enables the generation of new images based on text prompts and subject images, even without prior training on specific subjects. It also supports image manipulation, style transfer, and editing guided by subject images. The model is trained in two stages to learn subject representation and can be combined with other techniques for more control over the generation and editing process. Overall, BLIP-Diffusion provides a flexible and efficient approach to generate and edit images with specific subjects.
A comprehensive guide to ControlNet v1.1, a Stable Diffusion model that allows users to control image compositions and human poses based on reference images. The guide covers various aspects of ControlNet, including its installation on different platforms (Windows, Mac, Google Colab), settings, and common use cases.
The tutorial provides a comprehensive guide on creating consistent characters using Stable Diffusion (SD) and a Textual Inversion embedding. It outlines a five-step process, including generating input images, filtering them based on desired attributes, tagging them for training, training the embedding, and selecting a validated iteration. The tutorial emphasizes the importance of generating high-quality input images, filtering out unwanted variations, and fine-tuning the selection to achieve consistency. By following this tutorial, users can learn how to generate consistent characters with SD and create an embedding that reliably recreates the desired character across different poses, hairstyles, body types, and prompts.
The SD-CN-Animation project offers automated video stylization and text-to-video generation using StableDiffusion and ControlNet. It provides the ability to stylize videos automatically and generate new videos from text input, using various Stable Diffusion models as backbones. The project incorporates the 'RAFT' optical flow estimation algorithm to maintain animation stability and generate occlusion masks for frame generation. In text-to-video mode, it utilizes the 'FloweR' method for predicting optical flow from previous frames. The ControlNet model is recommended for better results in vid2vid mode.
UnpromptedControl is a tool used for guiding StableDiffusion models in image restoration and object removal tasks. By leveraging a simple hack, it allows for the restoration or removal of objects without requiring user prompts, leading to enhanced process efficiency. The tool uses ControlNet and StableDiffusionInpaintPipeline models to guide the inpainting process and restore the image to a more natural-looking state. However, the algorithm currently has limitations in processing images of people's faces and bodies.
DeepFloyd IF is a text-to-image model that utilizes the large language model T5-XXL-1.1 as a text encoder to generates intelligible and coherent image alongside with text. The model is capable of incorporating text into images, generating a high degree of photorealism and the ability to generate images with non-standard aspect ratios. It can also modify style, patterns, and details in images without the need for fine-tuning. DeepFloyd IF is modular, cascaded, and works in pixel space, utilizing diffusion models that inject random noise into data before reversing the process to generate new data samples from the noise.
Inpaint Anything is an innovative tool that seamlessly inpaints images, videos, and 3D scenes by allowing users to remove, fill, or replace objects with just a few clicks. It leverages advanced vision models like Segment Anything Model (SAM), LaMa, and Stable Diffusion (SD) to achieve these tasks. With support for multiple aspect ratios and resolutions up to 2K, Inpaint Anything offers a user-friendly interface for various modalities, including images, videos, and 3D scenes. The tool is continuously improving with new features and functionalities, making it an accessible and powerful solution for users seeking advanced inpainting capabilities.
Token Merging (ToMe) is a technique used to speed up transformers by merging redundant tokens, which helps reduce the workload for the transformer without compromising quality. The technique is applied to the underlying transformer blocks in Stable Diffusion, minimizing quality loss while preserving the speed-up and memory benefits. It works without training and can be used for any Stable Diffusion model, reducing the workload by up to 60%. ToMe for SD is not another efficient reimplementation of transformer modules, but an actual reduction of the total workload required to generate an image. The results of ToMe for SD show that it produces images similar to the originals, while being faster and using less memory, making it an efficient tool for image generation.
Safe & Stable is a user-friendly tool designed to convert stable diffusion checkpoint files (.ckpt) to the safer and more secure .safetensors format for tensor storage. This new format enhances security by preventing malicious Python code while improving performance during model loading on both CPUs and GPUs. The tool's graphical interface simplifies file selection and monitors conversion progress. Although the initial conversion still requires .ckpt data, future models will be distributed exclusively in the .safetensors format, eliminating the need for scanning or converting from potentially harmful pickle files.
Making sure that diffusion model-generated images are safe from undesirable content and copyrighted material is a serious concern. Previous methods for stopping this content could be easily bypassed, but this new technique can fine-tune the model weights to erase concepts permanently, without having to retrain the entire model. The technique works by using the model's knowledge to guide the output away from the targeted concept. The authors have tested this technique on artistic style erasure and erasing nudity from images, and found it to be more effective than previous methods at erasing targeted concepts. However, for large concepts, there may be a trade-off between complete erasure and interference with other concepts.
Paella is an easy-to-use text-to-image model that can turn text into pictures. It was inspired by earlier models but has simpler code for training and sampling. During training, it "noises" images by randomly replacing visual elements with others from a library, and then tries to predict the original elements. During sampling, the model creates a distribution over each element and then selects one at random to build up the final image. Paella is designed to make text-to-image models more accessible to non-experts in the field by simplifying the technical components.
This stable-diffusion-webui extension aims to limit the influence of certain tokens in language models by rewriting them as padding tokens. This is important because the position vector of certain tokens can affect other tokens in unexpected ways, and the goal is to remove this influence while still preserving the overall meaning of the sentence. The extension achieves this by generating a new prompt with the specified token replaced by a padding token, and overwriting the position vectors corresponding to the original token with those generated from the new prompt. The Cutoff option allows the user to control the level of influence limitation, with "off" being the default setting.
FoldFold allExpandExpand allAre you sure you want to delete this link?Are you sure you want to delete this tag?
The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community