PIXART-αhttps://pixart-alpha.github.io/
PIXART-α is an innovative Transformer-based text-to-image diffusion model introduced in this paper to overcome the substantial training costs associated with advanced models in the AIGC community. Offering competitive image generation quality akin to leading generators like Imagen and Midjourney, PIXART-α stands out for its efficiency, supporting high-resolution synthesis up to 1024px with significantly reduced training expenses. Its key features include a decomposed training strategy, an efficient Transformer with cross-attention modules, and an emphasis on high-informative data. PIXART-α's training speed surpasses existing models, saving costs (10.8% of Stable Diffusion v1.5's training time) and reducing CO2 emissions by 90%. With exceptional image quality, artistry, and semantic control, PIXART-α presents potential insights for startups and the AIGC community to build high-quality, low-cost generative models.