ScreenAgent logo

ScreenAgent

ScreenAgent AI Agent
Rating:
Rate it!

Overview

Open‑source VLM agent to control computer GUIs via mouse/keyboard planning and execution.

ScreenAgent is an open‑source Vision Language Model agent that interacts with real computer screens via screenshot observation and mouse/keyboard actions, following a planning‑execution‑reflection loop. It supports multi‑step GUI tasks, dataset collection, and achieves positioning accuracy comparable to GPT‑4V.

Autonomy level

81%

Reasoning: ScreenAgent demonstrates high autonomy through its three-phase operational framework: planning, acting, and reflecting. It enables continuous interaction with computer environments without human intervention by autonomously assessing execution status and adjusting actions in real-time. The reflection phase allows self-evaluation of action outcomes,...

Comparisons


Custom Comparisons

Some of the use cases of ScreenAgent:

  • Automating desktop tasks with LLM-based GUI control.
  • Building VLM agents that plan, act and reflect over screen state.
  • Collecting and leveraging GUI interaction datasets.
  • Researching multi-step visual task execution.

Loading Community Opinions...

Pricing model:

Code access:

Popularity level: 44%

Run this agent

Turn this idea into a hosted OpenClaw or Hermes worker.

Generate setup files, upload your own, or launch from a kit. Chat in the browser first, then attach WhatsApp, Telegram, or Slack when it is useful.

No setup work4 gatewaysClone winnersState saved

Hosted agent

OpenClaw or Hermes

saved state
Browser
WhatsApp
Telegram
Slack
Generate setup files, upload prepared files, or launch from a marketplace kit. Stop, resume, clone, and rollback without losing memory.
Run an OpenClaw or Hermes agent without a server.
Open Agent Factory

Did you find this page useful?

Not useful
Could be better
Neutral
Useful
Loved it!