Delete Set public Set private Add tags Delete tags
  Add tag   Cancel
  Delete tag   Cancel
  • • Curated knowledge about art and AI •
  •  
  • About
  • Lora
  • Prompts
  • Tags
  • Login

MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Modelshttps://github.com/Vision-CAIR/MiniGPT-4

  • image2text
  • llm
  • image2text
  • llm

MiniGPT-4, a vision-language model that aligns a frozen visual encoder with a frozen large language model (LLM) using one projection layer. The authors trained MiniGPT-4 using a two-stage process, with the first stage using 5 million aligned image-text pairs for traditional pretraining. To address generation issues, they proposed a novel approach using a small, high-quality dataset and ChatGPT to create high-quality image-text pairs. The second stage involved finetuning the model on this dataset using a conversation template to improve generation reliability and overall usability. The results show that MiniGPT-4 processes capabilities similar to GPT-4, such as detailed image description generation and website creation from handwritten drafts, as well as other emerging capabilities, like writing stories and poems based on images and teaching users how to cook with food photos. The method is computationally efficient and highlights the potential of advanced large language models for vision-language understanding.

MiniGPT-4

demo : Link1 Link2 Link3 Link4 Link5 Link6 Link7

7 months ago Permalink
cluster icon
  • OpenFlamingo-9B Demo : OpenFlamingo is a new tool that helps computers learn how to understand pictures and words together. The OpenFlamingo project aims to develop a multi...
  • Cheetah : Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions : Cheetor, a Transformer-based multi-modal large language model equipped with controllable knowledge re-injection. Cheetor demonstrates strong capabilit...
  • LLaVA: Large Language and Vision Assistant : The article presents a novel approach to large multimodal language models using machine-generated instruction-following data, which has shown promise ...
  • Instructblip : The paper focuses on a vision-language model called InstructBLIP and explores the process of instruction tuning. The authors collect 26 datasets and c...
  • WizardCoder : WizardCoder is a new model that utilizes complex instruction fine-tuning to enhance Code Large Language Models (Code LLMs) for code-related tasks. Thi...


(175)
Filter untagged links

 

 

 
Fold Fold all Expand Expand all Are you sure you want to delete this link? Are you sure you want to delete this tag? The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community