The clip-retrieval package allows for easy computing of clip embeddings and building of a clip retrieval system. It can be used to quickly compute image and text embeddings, build efficient indices, filter data, and host these indices with a simple Flask service. The package also includes a simple UI querying system. The clip-retrieval package has been used by cah-prepro to preprocess 400M image+text for the dataset and by other projects such as autofaiss and antarctic-captions. ClipClient allows remote querying of a clip-retrieval backend via python. The package is installable with pip.
HaveIBeenTrained is a tool that uses clip retrieval to search the largest public text-to-image datasets, Laion-5B and Laion-400M, to remove links to images that artists want to opt-out from being used to train generative AI systems. These datasets are typically shared as files that contain links to images on the internet and captions that describe them. Stability and Laion partner to remove links that have been flagged for removal, ensuring that future models will not be trained with the opted-out work. HaveIBeenTrained incorporates new datasets as they are released and partners with other organizations to serve as a once-only opt-out tool for every dataset used to train generative AI art tools. The solution builds upon retrieval tools created by the LAION community that enable efficient search through large collections of image-text pairs based on kNN indices pre-computed using CLIP models pre-trained by OpenAI and LAION.
FoldFold allExpandExpand allAre you sure you want to delete this link?Are you sure you want to delete this tag?
The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community