UI-Venus-1.5 2B: A GUI Agent Model for Interface Grounding and Navigation

UI-Venus-1.5 2B: A GUI Agent Model for Interface Grounding and Navigation

TL;DR

UI-Venus-1.5 2B is a GUI agent model for screen-based workflows where interface grounding, instruction-following, and navigation need to work together. The model family is built from the Qwen3-VL Series and trained through mid-training, offline RL, online RL, and model merging. On AIOZ AI, builders can review and download this model for GUI-agent evaluation.

What UI-Venus-1.5 2B Is

UI-Venus-1.5 2B, now listed on AIOZ AI, is a visual GUI agent model designed for screen-based workflows. It focuses on connecting interface perception with intended action: reading what appears on screen, identifying UI-specific objects, and helping map user instructions to the right interface targets.

A GUI agent model needs to interpret interface objects such as buttons, fields, menus, icons, and screen regions. Its value comes from turning visual UI context into action-relevant understanding.

How the Training Pipeline Works

The UI-Venus-1.5 family follows a progressive training pipeline that starts from the Qwen3-VL Series. The process adds GUI-specific knowledge, then improves task execution through reinforcement learning and model merging. Each stage supports a different part of GUI-agent development, from domain learning to dynamic navigation.

The training flow includes:

  1. Mid-training with large-scale GUI data for domain knowledge injection
  2. Offline reinforcement learning across grounding, mobile, and web objectives
  3. Online reinforcement learning for navigation in dynamic scenarios
  4. Model merging to combine specialized grounding, web, and mobile capabilities

Core Capabilities

UI-Venus-1.5 2B is most relevant when the interface itself is the input. Core capabilities include:

  • GUI grounding for locating interface elements
  • Screen understanding from visual UI context
  • Instruction-following for interface tasks
  • Web navigation support
  • Visual-only reasoning from screenshots

Technical Profile

UI-Venus-1.5 2B is part of the UI-Venus-1.5 model family, which includes dense 2B and 8B variants, plus a 30B-A3B MoE variant. The AIOZ AI listing focuses on the 2B model, while the broader family provides context for training, deployment, and benchmark coverage.

Key technical details include:

  • Model: UI-Venus-1.5 2B
  • Model family: UI-Venus-1.5
  • Model type: GUI Agent
  • Foundation: Qwen3-VL Series
  • Training process: mid-training, offline RL, online RL, model merging
  • Mid-training scale: 10B tokens across 30+ datasets
  • Deployment path: vLLM API serving
  • Requirements: vllm>=0.11.0 and transformers>=4.57.0
  • Model path: inclusionAI/UI-Venus-1.5-2B

Where It Fits Best

UI-Venus-1.5 2B is especially useful when a workflow depends on screenshots or visual UI context rather than plain text input alone.

Practical use cases include:

  • GUI element localization
  • Mobile app navigation experiments
  • Browser workflow evaluation
  • Screenshot-based interface understanding
  • Agent workflow prototyping.

Download It on AIOZ AI

Start with a focused interface task: choose a screenshot, write a clear instruction, and evaluate whether the model can identify the correct target or next step in your local workflow.

Download UI-Venus-1.5 2B and explore how it fits your GUI-agent experiments.

FAQ

Q1: What is UI-Venus-1.5 2B used for?

It is used for GUI agent tasks such as interface grounding, screen understanding, element localization, and navigation support.

Q2: What deployment setup is mentioned for the model?

The model can be served with vLLM using vllm>=0.11.0 and transformers>=4.57.0.

Q3: What should builders test first?

Start with one screenshot and one clear instruction. After the model can identify the correct target or next step, expand into longer mobile or web navigation workflows.