UnpromptedControl is a tool used for guiding StableDiffusion models in image restoration and object removal tasks. By leveraging a simple hack, it allows for the restoration or removal of objects without requiring user prompts, leading to enhanced process efficiency. The tool uses ControlNet and StableDiffusionInpaintPipeline models to guide the inpainting process and restore the image to a more natural-looking state. However, the algorithm currently has limitations in processing images of people's faces and bodies.
An experiment where an autonomous GPT (Generative Pre-trained Transformer) agent is given access to a browser to perform tasks. For example adding text to a webpage and making restaurant reservations. gpt-assistant requires Node.js, an OpenAI API key, and a Postgres database.
This stable-diffusion-webui extension aims to limit the influence of certain tokens in language models by rewriting them as padding tokens. This is important because the position vector of certain tokens can affect other tokens in unexpected ways, and the goal is to remove this influence while still preserving the overall meaning of the sentence. The extension achieves this by generating a new prompt with the specified token replaced by a padding token, and overwriting the position vectors corresponding to the original token with those generated from the new prompt. The Cutoff option allows the user to control the level of influence limitation, with "off" being the default setting.
This dataset is designed to train a ControlNet with human facial expressions. It includes keypoints for pupils to allow gaze direction. Training has been tested on Stable Diffusion v2.1 base (512) and Stable Diffusion v1.5.
A method for editing images using human instructions, which involves providing an input image and a written instruction that tells the model what to do, and the model follows these instructions to edit the image. To generate training data for this problem, the article combines the knowledge of two large pretrained models, a language model (GPT-3) and a text-to-image model (Stable Diffusion), to generate a large dataset of image editing examples.
This AUTOMATIC1111 UI extension allows users to create edited videos using ebsynth without requiring AE. The extension is confirmed to work properly with Controlnet installed. Users must install ffmpeg and Ebsynth before installing the extension from the Extensions tab of the webui. To use the extension, users must go to the Ebsynth Utility tab and create an empty directory for the project directory. The original video to be edited should be placed in this directory, and users should select stage 1 and generate. The process should be executed in order from stage 1 to 7, and progress is not reflected in the webui, so users should check the console screen. In the latest webui, dropping the image on the main screen of img2img is required to avoid errors.
ControlNet, a neural network structure that adds extra conditions to diffusion models to control them. ControlNet copies the weights of neural network blocks into a "locked" copy and a "trainable" copy, allowing the "trainable" copy to learn the condition while preserving the production-ready diffusion models. The "zero convolution" is used to prevent distortion during training, and no layer is trained from scratch, making it safe to use with small-scale or personal devices. The text also explains how ControlNet can be used with Stable Diffusion to reuse the SD encoder as a powerful backbone for learning diverse controls. The efficacy of the SD encoder as a backbone is validated through various evidences.
Tips for using ControlNet include adding negative prompts and ignoring canvas height/width. The addon have also T2I-Adapter support and experimental features such as CFG-based ControlNet, Guess Mode, and Multi-ControlNet/Joint Conditioning. The weight and guidance strength/start/end are factors that affect the influence of ControlNet on the original SD Unet.
The Stability Photoshop plugin enables users to generate and edit images using both Stable Diffusion and DALL•E 2 directly within Photoshop. The plugin can be obtained in two ways: by installing it from the Adobe Exchange or by downloading the CCX file directly. Users who wish to generate images locally will also need the Stable Diffusion API Server, and those who want to fine-tune their own models can use a fork of the DreamBooth project.
'Prompt translate' script for AUTOMATIC1111/stable-diffusion-webui translate prompt. This script allows you to write a query in promt query in your native language, and then translate it into English for better results and not resort to translators.
OpenOutpaint is an intuitive and convenient outpainting tool that provides queueable, cancelable dreams, arbitrary dream reticle size, and an effectively infinite, resizable, scalable canvas. It has a very functional and familiar layer system, and users can save, load, import, and export workspaces. It also includes an inpainting/touchup mask brush, webUI script support, prompt history panel, and interrogate tool, among others. The tool is available as an extension for webUI and has floating control panels and toolboxes with handy keyboard shortcuts. It supports upscaler for final output images, saves preferences and imported images to browser local storage, and has a reset to defaults button. It also has an optional generate-ahead function to keep generating dreams while users look through the ones that already exist.
A script for AUTOMATIC1111/stable-diffusion-webui that allows users to quickly add tags from a list to their prompt. The script adds a separate textbox for adding tags, which are automatically added to the end of the prompt when the user generates the image. The script comes with customizable tag files, allowing users to define their own favorite tags and exclude certain tags from the list. The tags are organized under the file name and each line represents a new tag. The script also allows for adding tags with attention brackets and comes with pre-existing tag files.
The EveryDream Tools repository contains data engineering tools for Stable Diffusion, an image project, and other image projects. The tools can be used for fine-tuning beyond the initial DreamBooth paper implementations. The repo includes tools for web scraping, auto-captioning, file renaming, image compression, and training. Ground truth Laion data can be mixed with training data to improve training quality. Captioned training and regularization have enabled multi-subject and multi-style training simultaneously.
The article discusses the challenges faced by large-scale text-to-image generation models in synthesizing high-quality images with novel concepts. Current attempts to teach these models new concepts have the drawback of overfitting to given reference images. To address this, the article proposes DreamArtist, which uses a positive-negative prompt-tuning strategy to train both positive and negative embeddings. The positive embedding captures the reference image's characteristics, while the negative embedding rectifies inadequacies from the positive embedding. The proposed method achieves superior generation performance over existing methods and is effective for more applications, including concept compositions and prompt-guided image editing.
This repo is the official PyTorch implementation of "DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Contrastive Prompt-Tuning" with Stable-Diffusion-webui.
With just one training image DreamArtist learns the content and style in it, generating diverse high-quality images with high controllability. Embeddings of DreamArtist can be easily combined with additional descriptions, as well as two learned embeddings.
Large-scale diffusion neural networks represent a substantial milestone in text-to-image generation, but they remain poorly understood, lacking interpretability analyses. In this paper, we perform a text-image attribution analysis on Stable Diffusion, a recently open-sourced model. To produce pixel-level attribution maps, we upscale and aggregate cross-attention word-pixel scores in the denoising subnetwork, naming our method DAAM. We evaluate its correctness by testing its semantic segmentation ability on nouns, as well as its generalized attribution quality on all parts of speech, rated by humans. We then apply DAAM to study the role of syntax in the pixel space, characterizing head--dependent heat map interaction patterns for ten common dependency relations. Finally, we study several semantic phenomena using DAAM, with a focus on feature entanglement, where we find that cohyponyms worsen generation quality and descriptive adjectives attend too broadly. To our knowledge, we are the first to interpret large diffusion models from a visuolinguistic perspective, which enables future lines of research.
Deforum is a community of AI image synthesis developers, enthusiasts, and artists. They have created a notebook using Stable Diffusion and continue to improve its functionality daily. It's free, it's amazing, and you can use it to make cool stuff with AI.
FoldFold allExpandExpand allAre you sure you want to delete this link?Are you sure you want to delete this tag?
The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community