/* desktop-control layer for coding agents */

$ guiport

Playwright for desktop apps,
built for coding agents.

# inspect, control, and replay any macOS app
guiport observe --app Calculator
guiport click   --app Calculator 'AXButton[identifier="Five"]'
guiport run     smoke.yaml
guiport serve   --mcp   # expose to Claude / Codex / opencode

macOS · shipped Linux · soon Windows · soon · Swift 5.9+ · MIT

01

Why guiport.

Agents shouldn't drive desktop apps by guessing pixels. guiport exposes the desktop as structured data — accessibility tree first, screenshots and OCR only as fallback — so flows are deterministic, replayable, and fast enough to use as test infra.

a.

Stable selectors.

Click and inspect by name, role, or identifier — never coordinates. Sub-second to read any window.

b.

Visual fallback.

When an app doesn't expose its structure, guiport falls back to reading what's on screen. Works on canvas and Electron apps too.

c.

Record once, replay forever.

Capture a flow, save it as YAML, replay deterministically without an LLM. Failures auto-save artifacts so you know exactly what broke.

02

Install.

Three options. macOS 13+ required. Linux / Windows on the roadmap.

Homebrew (recommended once tap is published)
brew tap edihasaj/guiport
brew install guiport
guiport doctor
Install script
curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | sh
From source
git clone https://github.com/edihasaj/guiport.git
cd guiport
swift build -c release
sudo cp .build/release/guiport /usr/local/bin/guiport
permissions. guiport needs Accessibility + Screen Recording granted to your terminal. Run guiport doctor --fix to fire the prompts and open the right System Settings panes.
03

Quick start.

List running apps.

$ guiport apps --with-windows
Calculator      com.apple.calculator   pid=12345  windows=1
Finder          com.apple.finder       pid=614    windows=2
Code            com.microsoft.VSCode   pid=48047  windows=1

Inspect an app's accessibility tree.

$ guiport tree --app Calculator --pretty | head -20
{
  "id": "/AXWindow[0]",
  "role": "AXWindow",
  "name": "Calculator",
  "children": [
    {
      "role": "AXButton",
      "identifier": "Five",
      "bounds": {"x": 605, "y": 442, "width": 48, "height": 48},
      "actions": ["AXPress"]
    }, ...
  ]
}

Click by stable selector — not coordinates.

$ guiport click --app Calculator 'AXButton[identifier="Five"]'
{"path":"ax","selector":"AXButton[identifier=\"Five\"]","detail":"5"}

App doesn't expose structure? Visual fallback is automatic.

$ guiport click --app Figma 'AXButton[name="Save"]'
{"path":"ocr","detail":"Save @ 980,124"}

Record → Replay.

$ guiport record smoke.yaml --app Calculator
recording for Calculator — interact with the app, Ctrl+C to stop
recording saved → smoke.yaml

$ guiport run smoke.yaml
{"passed":true, "steps":[ ... ]}
04

CLI reference.

commanddoes
doctorCheck (or --fix) Accessibility + Screen Recording permissions.
appsList running apps with windows.
observeSummarize the focused window of an app.
treeDump the accessibility tree as JSON.
findFind elements by selector. Visual fallback automatic; --strict to disable.
clickClick an element by selector. Visual fallback automatic; --strict to disable.
click-at X YClick at raw coordinates.
find-text "..."OCR-find on-screen text via Apple Vision.
click-text "..."OCR-find then click center.
type "..."Type Unicode text.
hotkey cmd+sSend a hotkey combo.
screenshotCapture a window or full screen to PNG.
record file.yamlLive recorder via CGEventTap → YAML test.
run file.yamlReplay a YAML test.
serve --mcpStart a JSON-RPC MCP server over stdio.
benchMeasure observe / tree / find latency.

Full help: guiport <command> --help.

05

Selector grammar.

role[attr="value"][attr~="substring"][index]

Examples:

# Exact match by AX role + name
button[name="Save"]

# Substring match
AXButton[name~="Open"]

# Stable identifier (best — survives layout changes)
*[identifier="save_btn"]

# Pick the third match
AXMenuItem[name~="File"][2]

Attributes: role, name (title), value, identifier, description, text (any of name/value/description), index.

Role aliases: button → AXButton, checkbox → AXCheckBox, textfield → AXTextField, etc.

06

YAML test format.

name: calculator 5+3=8
app: Calculator
timeout_ms: 4000
steps:
  - find:   'AXButton[identifier="AllClear"]'
  - click:  'AXButton[identifier="AllClear"]'
  - wait:   80
  - click:  'AXButton[identifier="Five"]'
  - click:  'AXButton[identifier="Add"]'
  - click:  'AXButton[identifier="Three"]'
  - click:  'AXButton[identifier="Equals"]'
  - assert:
      find:   'AXStaticText[value~="8"]'
      exists: true

Step types: wait · find · click · press · type · hotkey · screenshot · find_text · click_text · click_at · assert.

Failures snapshot the AX tree (JSON) + screenshot + action log into artifacts/fail-* for fast diagnosis.

07

MCP server.

Plug guiport into an MCP-aware agent (Claude Code, opencode, others). One JSON object per line over stdio:

$ guiport serve --mcp
{"jsonrpc":"2.0","id":1,"method":"initialize"}
<= {"jsonrpc":"2.0","result":{"protocolVersion":"2024-11-05",...}}

{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
   "name":"click","arguments":{
     "app":"Calculator",
     "selector":"AXButton[identifier=\"Five\"]",
     "fallback":"ocr"
   }
}}

Tools exposed: doctor, apps, observe, tree, find, click (+ fallback), click_at, find_text, click_text, type, hotkey, screenshot, run.

08

Platforms.

One CLI, one YAML schema, one MCP tool list across every platform. macOS ships today. Linux and Windows are next on the roadmap — same selectors, same replay tests, different OS plumbing underneath.

macOS shipped
macOS 13+ · arm64 + x86_64
  • Inspect any app's UI as structured data
  • Click, type, hotkey by selector
  • Record once, replay deterministically
  • Visual fallback for canvas / Electron apps
[ install ]
Linux coming soon
Same selectors, same YAML, same MCP.
  • Inspect any app's UI as structured data
  • Click, type, hotkey by selector
  • Record once, replay deterministically
  • Visual fallback for sparse-UI apps
[ track ↗ ]
Windows coming soon
Same selectors, same YAML, same MCP.
  • Inspect any app's UI as structured data
  • Click, type, hotkey by selector
  • Record once, replay deterministically
  • Visual fallback for sparse-UI apps
[ track ↗ ]
contract first. The selector grammar, YAML test format, and MCP tool list are part of the cross-platform contract. Adding Linux or Windows is implementing one DesktopAdapter protocol — never branching the schema.
09

One contract, every platform.

            guiport
       CLI · MCP · runner
              │
   ┌──────────┼──────────┐
   ▼          ▼          ▼
 macOS      Linux     Windows
 shipped     soon      soon

Same selectors, same YAML, same MCP tools — across every platform. New platforms slot in behind one shared contract; your replay tests never know which OS is underneath.