Image Mixerhttps://lambdalabs-image-mixer-demo.hf.space/?__theme=dark
The model described in the text is an extension of the Stable Diffusion Image Variations model that incorporates multiple CLIP image embeddings. During training, up to 5 random crops were taken from the training images, and their corresponding CLIP image embeddings were computed and concatenated to be used as the conditioning for the model. At inference time, the model can combine the image embeddings from multiple images to mix their concepts and add text concepts using the text encoder.