The Illustrated Stable Diffusionhttps://jalammar.github.io/illustrated-stable-diffusion/
An introduction to Stable Diffusion and its components. Stable Diffusion is a system consisting of several components and models that can be used for image generation from text and alteration of images. The text-understanding component uses a special Transformer language model that translates text information into a numeric representation. The image generator has two stages: the image information creator and the image decoder. The information creator uses a UNet neural network and a scheduling algorithm to process information in the image information space. Diffusion is the process that takes place inside the image information creator component, where relevant information is added in each step to produce a high-quality image. The text encoder is a Transformer language model that produces token embeddings. The Unet noise predictor is adjusted to incorporate text inputs by adding an attention layer between the ResNet blocks.