IF by DeepFloyd Labhttps://github.com/deep-floyd/IF
DeepFloyd IF is a text-to-image model that utilizes the large language model T5-XXL-1.1 as a text encoder to generates intelligible and coherent image alongside with text. The model is capable of incorporating text into images, generating a high degree of photorealism and the ability to generate images with non-standard aspect ratios. It can also modify style, patterns, and details in images without the need for fine-tuning. DeepFloyd IF is modular, cascaded, and works in pixel space, utilizing diffusion models that inject random noise into data before reversing the process to generate new data samples from the noise.