SDXL-Turbo stands out as a high-speed generative text-to-image model capable of producing photorealistic images from a given text prompt in a single network evaluation. It represents a distilled variant of SDXL 1.0, specifically trained for real-time synthesis. The model adopts Adversarial Diffusion Distillation (ADD), a novel training method detailed in the technical report, allowing efficient sampling of foundational image diffusion models in 1 to 4 steps while maintaining high image quality. Score distillation is employed to leverage large-scale off-the-shelf image diffusion models as a teacher signal, coupled with an adversarial loss to ensure superior image fidelity, even in scenarios with one or two sampling steps.
WizardLM is a pre-trained language model that can follow complex instructions using Evol-Instruct - a method that uses language models instead of humans to automatically produce open-domain instructions of various difficulty levels. WizardLM is still in development and will continue to improve by training on larger scales, adding more training data, and innovating more advanced large-model training methods. To fine-tune WizardLM model, alpaca_evol_instruct_70k.json containing 70K instruction-following data generated from Evol-Instruct was used. In terms of human evaluation, WizardLM achieved significantly better results than Alpaca and Vicuna-7b models on diverse user-oriented instructions including difficult coding generation, debugging, math, reasoning, complex formats, academic writing, and extensive disciplines. Additionally, in the high-difficulty section of the human evaluation test set, WizardLM even outperforms ChatGPT, indicating its significant potential to handle complex instructions.
Enhanced SRGAN (ESRGAN) is an improvement of the Super-Resolution Generative Adversarial Network (SR-GAN) that is capable of generating realistic textures during single image super-resolution. ESRGAN improves three key components of SRGAN, including network architecture, adversarial loss, and perceptual loss, to enhance the visual quality of images. ESRGAN introduces the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit, borrows the idea from relativistic GAN to let the discriminator predict relative realness, and improves the perceptual loss by using the features before activation for stronger supervision. ESRGAN achieves better visual quality with more realistic and natural textures than SRGAN, winning the first place in the PIRM2018-SR Challenge.
The text introduces Civitai, a platform that allows users to share and discover resources for creating AI art. Civitai provides users with custom models that they can train using their own data or download models created by other users. Models are machine learning algorithms that have been trained to generate art or media in a particular style. Users can use these models with AI art software to create unique works of art. Civitai's community is constantly sharing new and interesting models, making it a vibrant and supportive community of AI artists.
The model for Stable Diffusion Inpainting, a latent text-to-image diffusion model that can generate photo-realistic images based on any text input and has the additional capability of inpainting pictures using a mask. The model was initialized with the weights of Stable-Diffusion-v-1-2 and underwent regular training for 595k steps followed by inpainting training for 440k steps at a resolution of 512x512 using the "laion-aesthetics v2 5+" dataset. The model also underwent 10% dropping of text-conditioning to improve classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels, and synthetic masks were generated during training, with 25% of the input masked.
To extract the LoRA model from the base model using Kohya_ss, users can install the repository from GitHub and navigate to the Utilities tab, then select the Extract LoRA tab. The user must select the finetuned model to extract LoRA from and the Stable diffusion base model, then choose the path to save the output LoRA file and the precision. The Network Dimension scrollbar can be left at the default value of 8, and the user can click on the "Extract LoRA model" button.
Alternatively, users can download two files from kohya_ss\networks: lora.py and extract_lora_from_models.py, and make a .bat file to execute them. They need to change the file names and dimensions to fit their case and install a few requirements if their SD distribution doesn't have them already. They can also use qwerty.py to check which models fit their new LORA best, which is also useful for Textual Inversion embeddings.
FoldFold allExpandExpand allAre you sure you want to delete this link?Are you sure you want to delete this tag?
The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community