/* desktop-control layer for coding agents */

$ guiport▌

Playwright for desktop apps,
built for coding agents.

# inspect, control, and replay any macOS app
guiport observe --app Calculator
guiport click   --app Calculator 'AXButton[identifier="Five"]'
guiport run     smoke.yaml
guiport serve   --mcp   # expose to Claude / Codex / opencode

[ install ] [ source ↗ ]

macOS · shipped Linux · soon Windows · soon · Swift 5.9+ · MIT

Why guiport.

Agents shouldn't drive desktop apps by guessing pixels. guiport exposes the desktop as structured data — accessibility tree first, screenshots and OCR only as fallback — so flows are deterministic, replayable, and fast enough to use as test infra.

Stable selectors.

Click and inspect by name, role, or identifier — never coordinates. Sub-second to read any window.

Visual fallback.

When an app doesn't expose its structure, guiport falls back to reading what's on screen. Works on canvas and Electron apps too.

Record once, replay forever.

Capture a flow, save it as YAML, replay deterministically without an LLM. Failures auto-save artifacts so you know exactly what broke.

Install.

Three options. macOS 13+ required. Linux / Windows on the roadmap.

Homebrew (recommended once tap is published)

brew tap edihasaj/guiport
brew install guiport
guiport doctor

Install script

curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | sh

From source

git clone https://github.com/edihasaj/guiport.git
cd guiport
swift build -c release
sudo cp .build/release/guiport /usr/local/bin/guiport

permissions. guiport needs Accessibility + Screen Recording granted to your terminal. Run guiport doctor --fix to fire the prompts and open the right System Settings panes.

Quick start.

List running apps.

$ guiport apps --with-windows
Calculator      com.apple.calculator   pid=12345  windows=1
Finder          com.apple.finder       pid=614    windows=2
Code            com.microsoft.VSCode   pid=48047  windows=1

Inspect an app's accessibility tree.

$ guiport tree --app Calculator --pretty | head -20
{
  "id": "/AXWindow[0]",
  "role": "AXWindow",
  "name": "Calculator",
  "children": [
    {
      "role": "AXButton",
      "identifier": "Five",
      "bounds": {"x": 605, "y": 442, "width": 48, "height": 48},
      "actions": ["AXPress"]
    }, ...
  ]
}

Click by stable selector — not coordinates.

$ guiport click --app Calculator 'AXButton[identifier="Five"]'
{"path":"ax","selector":"AXButton[identifier=\"Five\"]","detail":"5"}

App doesn't expose structure? Visual fallback is automatic.

$ guiport click --app Figma 'AXButton[name="Save"]'
{"path":"ocr","detail":"Save @ 980,124"}

Record → Replay.

$ guiport record smoke.yaml --app Calculator
recording for Calculator — interact with the app, Ctrl+C to stop
recording saved → smoke.yaml

$ guiport run smoke.yaml
{"passed":true, "steps":[ ... ]}

CLI reference.

command	does
`doctor`	Check (or `--fix`) Accessibility + Screen Recording permissions.
`apps`	List running apps with windows.
`observe`	Summarize the focused window of an app.
`tree`	Dump the accessibility tree as JSON.
`find`	Find elements by selector. Visual fallback automatic; `--strict` to disable.
`click`	Click an element by selector. Visual fallback automatic; `--strict` to disable.
`click-at X Y`	Click at raw coordinates.
`find-text "..."`	OCR-find on-screen text via Apple Vision.
`click-text "..."`	OCR-find then click center.
`type "..."`	Type Unicode text.
`hotkey cmd+s`	Send a hotkey combo.
`screenshot`	Capture a window or full screen to PNG.
`record file.yaml`	Live recorder via CGEventTap → YAML test.
`run file.yaml`	Replay a YAML test.
`serve --mcp`	Start a JSON-RPC MCP server over stdio.
`bench`	Measure observe / tree / find latency.

Full help: guiport <command> --help.

Selector grammar.

role[attr="value"][attr~="substring"][index]

Examples:

# Exact match by AX role + name
button[name="Save"]

# Substring match
AXButton[name~="Open"]

# Stable identifier (best — survives layout changes)
*[identifier="save_btn"]

# Pick the third match
AXMenuItem[name~="File"][2]

Attributes: role, name (title), value, identifier, description, text (any of name/value/description), index.

Role aliases: button → AXButton, checkbox → AXCheckBox, textfield → AXTextField, etc.

YAML test format.

name: calculator 5+3=8
app: Calculator
timeout_ms: 4000
steps:
  - find:   'AXButton[identifier="AllClear"]'
  - click:  'AXButton[identifier="AllClear"]'
  - wait:   80
  - click:  'AXButton[identifier="Five"]'
  - click:  'AXButton[identifier="Add"]'
  - click:  'AXButton[identifier="Three"]'
  - click:  'AXButton[identifier="Equals"]'
  - assert:
      find:   'AXStaticText[value~="8"]'
      exists: true

Step types: wait · find · click · press · type · hotkey · screenshot · find_text · click_text · click_at · assert.

Failures snapshot the AX tree (JSON) + screenshot + action log into artifacts/fail-* for fast diagnosis.

MCP server.

Plug guiport into an MCP-aware agent (Claude Code, opencode, others). One JSON object per line over stdio:

$ guiport serve --mcp
{"jsonrpc":"2.0","id":1,"method":"initialize"}
<= {"jsonrpc":"2.0","result":{"protocolVersion":"2024-11-05",...}}

{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{
   "name":"click","arguments":{
     "app":"Calculator",
     "selector":"AXButton[identifier=\"Five\"]",
     "fallback":"ocr"
   }
}}

Tools exposed: doctor, apps, observe, tree, find, click (+ fallback), click_at, find_text, click_text, type, hotkey, screenshot, run.

Platforms.

One CLI, one YAML schema, one MCP tool list across every platform. macOS ships today. Linux and Windows are next on the roadmap — same selectors, same replay tests, different OS plumbing underneath.

macOS shipped

macOS 13+ · arm64 + x86_64

Inspect any app's UI as structured data
Click, type, hotkey by selector
Record once, replay deterministically
Visual fallback for canvas / Electron apps

[ install ]

Linux coming soon

Same selectors, same YAML, same MCP.

Inspect any app's UI as structured data
Click, type, hotkey by selector
Record once, replay deterministically
Visual fallback for sparse-UI apps

[ track ↗ ]

Windows coming soon

Same selectors, same YAML, same MCP.

Inspect any app's UI as structured data
Click, type, hotkey by selector
Record once, replay deterministically
Visual fallback for sparse-UI apps

[ track ↗ ]

contract first. The selector grammar, YAML test format, and MCP tool list are part of the cross-platform contract. Adding Linux or Windows is implementing one DesktopAdapter protocol — never branching the schema.

One contract, every platform.

            guiport
       CLI · MCP · runner
              │
   ┌──────────┼──────────┐
   ▼          ▼          ▼
 macOS      Linux     Windows
 shipped     soon      soon

Same selectors, same YAML, same MCP tools — across every platform. New platforms slot in behind one shared contract; your replay tests never know which OS is underneath.