DOCCI: A 15K-Image Playground for Vision-and-Language Models

What is DOCCI?
DOCCI stands for Descriptions of Connected & Contrasting Images.
It is a dataset of curated images, each paired with detailed, human-written caption descriptions.
These descriptions highlight the key elements of the images, as well as secondary information, such as background, lighting, and settings.
The images are precisely captured to help models perceive and differentiate their visual properties.
DOCCI includes many related images that are nearly identical, with subtle but important differences.
Every caption is hand-crafted to highlight those differences, making each image easy to tell apart from its close counterparts.
Explore DOCCI on AIOZ AI now: https://aiozai.network/datasets/8a876fb7-c41d-465e-89e6-2fb68b58dad0

What’s Inside
DOCCI’s 15,000 images are organised for smooth model development:
- Training set – 9,647 images: for the model to learn from.
- Test set – 5,000 images: held back until training ends to give an unbiased score.
- Quick-check sets – 2 × 100 images: tiny subsets (one for development, one for final scoring) for instant debugging or validation.
Each entry includes the image, a unique ID, and a detailed caption.
Why you should use it
- Fine-grained captioning & retrieval – Teach your model to pick the one photo (or sentence) that matches a detailed description.
- Stress-testing text-to-image (T2I) – Push generators with long, dense prompts packed with spatial cues and object counts.
- Multimodal reasoning research – Explore how models handle subtle scene changes like lighting shifts or extra objects.
License
Creative Commons CC BY-4.0 International License – Credit to Yasumasa Onoe, Sunayana Rane, and collaborators for creating and maintaining the DOCCI dataset.
Try it now
Ready to test real visual understanding?
Check out DOCCI and other datasets on AIOZ AI and plug them straight into your training pipeline.

About the AIOZ Network
AIOZ Network is a DePIN for Web3 AI, Storage, and Streaming.
Powered by a global community of AIOZ DePINs, AIOZ rewards you for sharing your computational resources for storing, transcoding, and streaming digital media content and powering decentralized AI computation.
Find Us
AIOZ All Links | Website | X | Telegram