/* desktop-control layer for coding agents */
$ guiport▌
Playwright for desktop apps,
built for coding agents.
# inspect, control, and replay any macOS app guiport observe --app Calculator guiport click --app Calculator 'AXButton[identifier="Five"]' guiport run smoke.yaml guiport serve --mcp # expose to Claude / Codex / opencode
macOS · shipped Linux · soon Windows · soon · Swift 5.9+ · MIT
Why guiport.
Agents shouldn't drive desktop apps by guessing pixels. guiport exposes the desktop as structured data — accessibility tree first, screenshots and OCR only as fallback — so flows are deterministic, replayable, and fast enough to use as test infra.
Stable selectors.
Click and inspect by name, role, or identifier — never coordinates. Sub-second to read any window.
Visual fallback.
When an app doesn't expose its structure, guiport falls back to reading what's on screen. Works on canvas and Electron apps too.
Record once, replay forever.
Capture a flow, save it as YAML, replay deterministically without an LLM. Failures auto-save artifacts so you know exactly what broke.
Install.
Three options. macOS 13+ required. Linux / Windows on the roadmap.
Homebrew (recommended once tap is published)
brew tap edihasaj/guiport brew install guiport guiport doctor
Install script
curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | sh
From source
git clone https://github.com/edihasaj/guiport.git cd guiport swift build -c release sudo cp .build/release/guiport /usr/local/bin/guiport
guiport doctor --fix to fire the prompts and
open the right System Settings panes.
Quick start.
List running apps.
$ guiport apps --with-windows Calculator com.apple.calculator pid=12345 windows=1 Finder com.apple.finder pid=614 windows=2 Code com.microsoft.VSCode pid=48047 windows=1
Inspect an app's accessibility tree.
$ guiport tree --app Calculator --pretty | head -20
{
"id": "/AXWindow[0]",
"role": "AXWindow",
"name": "Calculator",
"children": [
{
"role": "AXButton",
"identifier": "Five",
"bounds": {"x": 605, "y": 442, "width": 48, "height": 48},
"actions": ["AXPress"]
}, ...
]
}
Click by stable selector — not coordinates.
$ guiport click --app Calculator 'AXButton[identifier="Five"]'
{"path":"ax","selector":"AXButton[identifier=\"Five\"]","detail":"5"}
App doesn't expose structure? Visual fallback is automatic.
$ guiport click --app Figma 'AXButton[name="Save"]'
{"path":"ocr","detail":"Save @ 980,124"}
Record → Replay.
$ guiport record smoke.yaml --app Calculator
recording for Calculator — interact with the app, Ctrl+C to stop
recording saved → smoke.yaml
$ guiport run smoke.yaml
{"passed":true, "steps":[ ... ]}
CLI reference.
| command | does |
|---|---|
doctor | Check (or --fix) Accessibility + Screen Recording permissions. |
apps | List running apps with windows. |
observe | Summarize the focused window of an app. |
tree | Dump the accessibility tree as JSON. |
find | Find elements by selector. Visual fallback automatic; --strict to disable. |
click | Click an element by selector. Visual fallback automatic; --strict to disable. |
click-at X Y | Click at raw coordinates. |
find-text "..." | OCR-find on-screen text via Apple Vision. |
click-text "..." | OCR-find then click center. |
type "..." | Type Unicode text. |
hotkey cmd+s | Send a hotkey combo. |
screenshot | Capture a window or full screen to PNG. |
record file.yaml | Live recorder via CGEventTap → YAML test. |
run file.yaml | Replay a YAML test. |
serve --mcp | Start a JSON-RPC MCP server over stdio. |
bench | Measure observe / tree / find latency. |
Full help: guiport <command> --help.
Selector grammar.
role[attr="value"][attr~="substring"][index]
Examples:
# Exact match by AX role + name button[name="Save"] # Substring match AXButton[name~="Open"] # Stable identifier (best — survives layout changes) *[identifier="save_btn"] # Pick the third match AXMenuItem[name~="File"][2]
Attributes: role, name (title), value, identifier, description, text (any of name/value/description), index.
Role aliases: button → AXButton, checkbox → AXCheckBox, textfield → AXTextField, etc.
YAML test format.
name: calculator 5+3=8 app: Calculator timeout_ms: 4000 steps: - find: 'AXButton[identifier="AllClear"]' - click: 'AXButton[identifier="AllClear"]' - wait: 80 - click: 'AXButton[identifier="Five"]' - click: 'AXButton[identifier="Add"]' - click: 'AXButton[identifier="Three"]' - click: 'AXButton[identifier="Equals"]' - assert: find: 'AXStaticText[value~="8"]' exists: true
Step types: wait · find · click · press · type · hotkey · screenshot · find_text · click_text · click_at · assert.
Failures snapshot the AX tree (JSON) + screenshot + action log into artifacts/fail-* for fast diagnosis.
MCP server.
Plug guiport into an MCP-aware agent (Claude Code, opencode, others). One JSON object per line over stdio:
$ guiport serve --mcp
{"jsonrpc":"2.0","id":1,"method":"initialize"}
<= {"jsonrpc":"2.0","result":{"protocolVersion":"2024-11-05",...}}
{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
"name":"click","arguments":{
"app":"Calculator",
"selector":"AXButton[identifier=\"Five\"]",
"fallback":"ocr"
}
}}
Tools exposed: doctor, apps, observe, tree, find, click (+ fallback), click_at, find_text, click_text, type, hotkey, screenshot, run.
Platforms.
One CLI, one YAML schema, one MCP tool list across every platform. macOS ships today. Linux and Windows are next on the roadmap — same selectors, same replay tests, different OS plumbing underneath.
- Inspect any app's UI as structured data
- Click, type, hotkey by selector
- Record once, replay deterministically
- Visual fallback for canvas / Electron apps
- Inspect any app's UI as structured data
- Click, type, hotkey by selector
- Record once, replay deterministically
- Visual fallback for sparse-UI apps
- Inspect any app's UI as structured data
- Click, type, hotkey by selector
- Record once, replay deterministically
- Visual fallback for sparse-UI apps
DesktopAdapter protocol — never branching the schema.
One contract, every platform.
guiport CLI · MCP · runner │ ┌──────────┼──────────┐ ▼ ▼ ▼ macOS Linux Windows shipped soon soon
Same selectors, same YAML, same MCP tools — across every platform. New platforms slot in behind one shared contract; your replay tests never know which OS is underneath.