Learn the vocabulary of an AI agent that controls a screen's mouse and keyboard directly.
0 / 5 completed
1 / 5
At standup, a dev mentions an AI agent that can view a screenshot of a desktop application and control the mouse and keyboard to complete a task, rather than only calling a defined API. What is this capability called?
A computer-use agent views a screenshot of a desktop application and controls the mouse and keyboard directly to complete a task, rather than relying only on a defined API the application happens to expose. A traditional API integration can't operate an application that has no API for the needed action. This visual, direct-control approach lets an agent work with legacy or GUI-only software it otherwise couldn't automate.
2 / 5
During a design review, the team wants the agent's screen to run inside an isolated virtual machine rather than directly on a developer's own workstation. Which capability supports this?
Sandboxed virtual display isolation runs the agent's screen inside an isolated virtual machine, so an unintended or mistaken click can't affect a developer's own live workstation or its real files. Running the agent directly on a developer's own machine risks a mistaken action causing real, unintended damage. This isolation is a core safety pattern for a computer-use agent, since it can control input as broadly as a real user would.
3 / 5
In a code review, a dev notices the agent pauses and requests explicit confirmation before clicking a button that would submit a payment or delete a file. What does this represent?
A confirmation checkpoint pauses the agent before it clicks a button leading to an irreversible action, like a payment or file deletion, and asks for explicit approval first. Letting the agent click any button autonomously risks an irreversible action happening without genuine user intent. This checkpoint is essential given how broadly a computer-use agent's input control can reach.
4 / 5
An incident report shows a computer-use agent misread a small, ambiguous icon on screen and clicked the wrong button, triggering an unintended action in a live application. What practice would prevent this?
Running the agent in a sandboxed environment, and requiring confirmation before an action that isn't easily reversible, limits the real-world damage of a misread icon triggering the wrong click. Running directly against a live application with no safeguard risks exactly the kind of unintended action this incident describes. This combination of sandboxing and confirmation is standard practice for a computer-use agent's broad input control.
5 / 5
During a PR review, a teammate asks why the team sandboxes a computer-use agent's virtual display instead of letting it operate directly on a real production desktop. What is the reasoning?
A misread icon or a mistaken click can trigger a real, unintended action, and a computer-use agent's input control reaches just as broadly as a real user's would. Sandboxing isolates that risk into a disposable virtual environment instead of a real production desktop and its actual files. The tradeoff is the added infrastructure of maintaining an isolated, disposable virtual display for the agent to operate within.