ScreenAgent logo

ScreenAgent

ScreenAgent AI Agent
Rating:
Rate it!

Overview

Open‑source VLM agent to control computer GUIs via mouse/keyboard planning and execution.

ScreenAgent is an open‑source Vision Language Model agent that interacts with real computer screens via screenshot observation and mouse/keyboard actions, following a planning‑execution‑reflection loop. It supports multi‑step GUI tasks, dataset collection, and achieves positioning accuracy comparable to GPT‑4V.

Autonomy level

81%

Reasoning: ScreenAgent demonstrates high autonomy through its three-phase operational framework: planning, acting, and reflecting. It enables continuous interaction with computer environments without human intervention by autonomously assessing execution status and adjusting actions in real-time. The reflection phase allows self-evaluation of action outcomes,...

Comparisons


Custom Comparisons

Some of the use cases of ScreenAgent:

  • Automating desktop tasks with LLM-based GUI control.
  • Building VLM agents that plan, act and reflect over screen state.
  • Collecting and leveraging GUI interaction datasets.
  • Researching multi-step visual task execution.

Loading Community Opinions...

Pricing model:

Code access:

Popularity level: 44%

Did you find this page useful?

Not useful
Could be better
Neutral
Useful
Loved it!