Zero123++ is an advanced image-conditioned diffusion model highlighted in this paper, focusing on generating 3D-consistent multi-view images from a single input view. The model minimizes the effort of fine-tuning by leveraging pre-trained 2D generative priors, particularly from StableDiffusion. Noteworthy improvements include tiling six views into a single image, shifting the noise schedule, implementing scaled reference attention for local conditioning, and introducing FlexDiffuse for global conditioning. Through these enhancements, Zero123++ excels in producing high-quality, consistent multi-view images, overcoming issues like texture degradation and geometric misalignment. The paper showcases the model's effectiveness through qualitative and quantitative comparisons with leading models, emphasizing its potential for various applications. Additionally, a depth-controlled version of Zero123++ is introduced, demonstrating superior performance with ControlNet and highlighting the model's versatility.
AI company Optic has developed a web tool called AI or Not, which aims to combat misinformation spread through AI-generated images. The tool scans images and quickly determines whether they were generated by artificial intelligence or by humans. Optic claims that its algorithms provide highly accurate results with a precision rate of 95%. However, users may have concerns about privacy when uploading images to the tool. Optic states that uploaded images and URLs are not stored on its servers longer than necessary and that they adhere to data protection regulations. By analyzing images and detecting signs of AI generation, the company aims to improve its algorithms and machine learning techniques.
The article presents a method for creating word-as-image illustrations automatically, which involves creating a visualization of the meaning of a word while preserving its readability. The method relies on large pretrained language-vision models to distill textual concepts visually and optimize the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. The approach uses a differentiable rasterizer and additional loss terms to ensure the legibility of the text and the preservation of the style of the font. The method can handle a large variety of semantic concepts and use any font while preserving the legibility of the text and the font's style. The article provides a detailed explanation of how the method works, including the optimization of parameters for each letter and the use of additional loss functions to preserve the original letter's shape and ensure legibility.
This grid presents an exploration of Stable Diffusion v1.4 using the Euler ancestral sampler for the same prompt and seed, but different steps and CFG scale shown in X/Y plots per different seed resizing. The prompt used is "stain glass window of goddess warrior, light shiny through, intricate, elegant, highly detailed, digital painting, sharp focus, realistic, hyperrealistic, cinematic, illustration."
DiffusionDB is a large-scale text-to-image prompt dataset containing 14 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users. The dataset primarily consists of English text but also includes other languages such as Spanish, Chinese, and Russian. DiffusionDB provides two subsets, DiffusionDB 2M and DiffusionDB Large, split into 2,000 folders and 14,000 folders, respectively. The dataset includes metadata tables metadata.parquet and metadata-large.parquet, which can be used to access prompts and other attributes of images without downloading all the Zip files. The tables are stored in the Parquet format, making it efficient to query individual columns without reading the entire table.
HaveIBeenTrained is a tool that uses clip retrieval to search the largest public text-to-image datasets, Laion-5B and Laion-400M, to remove links to images that artists want to opt-out from being used to train generative AI systems. These datasets are typically shared as files that contain links to images on the internet and captions that describe them. Stability and Laion partner to remove links that have been flagged for removal, ensuring that future models will not be trained with the opted-out work. HaveIBeenTrained incorporates new datasets as they are released and partners with other organizations to serve as a once-only opt-out tool for every dataset used to train generative AI art tools. The solution builds upon retrieval tools created by the LAION community that enable efficient search through large collections of image-text pairs based on kNN indices pre-computed using CLIP models pre-trained by OpenAI and LAION.
A collection of GAN. GAN is a new generator architecture for generative adversarial networks that enables unsupervised separation of high-level attributes and stochastic variation in generated images. This generator improves the state-of-the-art in distribution quality metrics and disentangles latent factors of variation. The article also introduces two new automated methods to quantify interpolation quality and disentanglement, and a new dataset of human faces.
A book of 1000 paintings and illustrations of robots created by artificial intelligence. The author generated all of the images in this book by writing original prompts for DALL·E 2, OpenAI’s AI system that can create realistic images and art from a description in natural language. Upon generating the images, the author curated and arranged the images to their own liking and takes ultimate responsibility for the content of this publication.
Neural Cellular Automata (NCA) are capable of learning diverse behaviors and can solve complex tasks through massively parallel and inherently degenerate processes. The article focuses on applying NCA to the task of texture synthesis, reproducing the general appearance of a texture template rather than pixel-perfect copies. After training NCA models to reproduce textures, the article investigates their learned behaviors and observes surprising effects, suggesting that the cells learn distributed, local algorithms. The article employs NCA as a differentiable image parameterization to accomplish this.
Humans of AI is an online exhibition that showcases three works based on the COCO image dataset. The exhibition aims to credit and applaud the photographers who made the technical achievement of machine learning algorithms possible. By showing the actual training pictures and giving credit where it's due, Humans of AI exposes the myth of magically intelligent machines and highlights the importance of acknowledging the hard work that goes into creating the datasets used to train these algorithms.
YOLO is a machine learning algorithm that detects objects in images and labels them with a single category. While it may seem like magic, computers recognize pixel formations statistically similar to previously learned data. The first piece in the Humans of AI series, Declassifier, processes images using YOLO and superimposes images from the training dataset COCO, exposing the myth of intelligent machines and highlighting the biases and glitches present in the dataset. Declassifier ultimately helps users understand how machines see by visualizing the data that conditioned a certain prediction.
Generative Adversarial Networks (GANs) are an exciting tool for artists, allowing them to create unique, unpredictable digital art. The GAN process simulates a game, with a Critic network and a generator network competing to create realistic images. The artist works as a curator, selecting the most interesting images produced by the generator. The author recommends using CycleGAN, a neural network architecture that transforms images from one dataset into the style of another, as it allows for high-resolution images and quick training. The author offers practical advice on using CycleGAN, such as fine-tuning models on smaller datasets and experimenting with different batch sizes. The author also emphasizes the importance of using unique, personal datasets for training. Overall, the author encourages artists to experiment with GANs and to let the unpredictability of the process inspire them to create something special.
FoldFold allExpandExpand allAre you sure you want to delete this link?Are you sure you want to delete this tag?
The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community