AIOZ AI Research: AffordMatcher Advances 3D Scene Understanding

We are delighted to announce that the AIOZ AI Research paper, "AffordMatcher: Affordance Learning in 3D Scenes from Visual Signifiers," has been accepted for publication at the Conference on Computer Vision and Pattern Recognition (CVPR 2026).
Understanding 3D scenes requires more than recognizing objects. For embodied AI, the key challenge is identifying where actions can happen, such as handles to pull, buttons to press, or surfaces to interact with.
To address this, the AIOZ AI Research team introduces AffordBridge, a large-scale multimodal dataset, and AffordMatcher, a framework for localizing actionable regions in complex 3D scenes.
The Challenge in 3D Affordance Understanding
For embodied AI, recognizing objects is only the beginning. A robot, AR system, or autonomous assistant also needs to understand how a physical environment can be used.
This is the core problem of affordance learning: identifying the parts of a scene that make specific interactions possible.
The challenge becomes harder in full 3D scenes, where functional regions may be small, occluded, or visually similar to surrounding geometry. Real environments require scene-level reasoning across objects, spatial layout, and interaction context.
AffordBridge as a Multimodal Dataset
AffordBridge focuses on full-scene understanding, allowing models to reason about interactions within realistic, complex human environments.
The dataset includes:
- 291,637 functional interaction annotations
- 685 high-resolution indoor scenes represented as point clouds
- 157 object categories
- 61 actionable affordances, including actions such as push, pull, rotate, and open
- RGB visual signifiers showing humans interacting with objects
- Natural language descriptions aligned with 3D geometry

How AffordMatcher Works
Built on AffordBridge, AffordMatcher is a framework for zero-shot affordance segmentation in 3D scenes. It is designed to localize functional regions even when the model encounters unfamiliar objects or scenes.
The framework uses a dual-branch design:
- Reasoning Extractor: encodes interaction semantics from 2D visual signifiers, including cues such as hand-object contact.
- Affordance Extractor: processes the 3D scene and identifies candidate interaction regions using spatial and geometric features.
The core mechanism is a Match-to-match attention strategy combined with a cross-modal dissimilarity matrix. They help align 2D interaction evidence with 3D regions, especially when geometry alone is not enough to explain how an object can be used.

Experimental Validation and Performance
AffordMatcher demonstrates strong performance on the AffordBridge benchmark.
- 53.4 overall mAP
- 7.8-point improvement over the previous best baseline
- 112.5 ms inference speed per sample
- Clearer affordance separation in t-SNE visualizations from the reasoning module
These results show gains in both localization accuracy and practical efficiency, with inference speed that points toward near real-time use in robotics, AR, and other interactive systems.
Why It Matters for Embodied AI
AffordMatcher supports a broader shift from perception-only AI toward interaction-aware AI.
Practical applications include:
- Robotic manipulation: helping robots identify where to grasp, press, pull, or rotate objects
- Augmented reality: supporting context-aware interaction guidance in 3D environments
- Visual navigation: improving how autonomous systems understand actionable regions in human spaces
For AIOZ AI Research, this work highlights a central direction in spatial intelligence: helping AI move from seeing the world to understanding how to act within it.
Conclusion
AffordBridge and AffordMatcher advance 3D scene understanding by giving models a richer way to learn from human interaction, language, and spatial geometry together.
By combining large-scale multimodal annotations with cross-modal reasoning, AffordMatcher provides a practical step toward AI systems that understand not only what objects are, but how they can be used.
For more technical details:
Paper: https://arxiv.org/abs/2603.27970
Project site: https://aioz-ai.github.io/AffordMatcher/