6.6 KiB
Command Reference
Complete reference for all agent-browser functions. For quick start, see SKILL.md.
Base Command
All commands follow this pattern:
infsh app run agent-browser --function <function> --session <session_id|new> --input '<json>'
--function: Function to call (open, snapshot, interact, screenshot, execute, close)--session: Session ID from previous call, ornewto start fresh--input: JSON input for the function
Functions
open
Navigate to URL and configure browser. This is the entry point for all sessions.
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"width": 1280,
"height": 720,
"user_agent": "Mozilla/5.0...",
"record_video": false,
"show_cursor": false,
"proxy_url": null,
"proxy_username": null,
"proxy_password": null
}'
Input Fields:
| Field | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL to navigate to |
width |
int | 1280 | Viewport width in pixels |
height |
int | 720 | Viewport height in pixels |
user_agent |
string | null | Custom user agent string |
record_video |
bool | false | Record video (returned on close) |
show_cursor |
bool | false | Show cursor indicator in screenshots/video |
proxy_url |
string | null | Proxy server URL |
proxy_username |
string | null | Proxy auth username |
proxy_password |
string | null | Proxy auth password |
Output:
{
"session_id": "abc123",
"url": "https://example.com",
"title": "Example Domain",
"elements": [...],
"elements_text": "@e1 [a] \"More information...\" href=\"...\"\n...",
"screenshot": "<File>"
}
snapshot
Re-fetch page state with @e refs. Call after navigation or DOM changes.
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'
Output: Same as open (url, title, elements, elements_text, screenshot)
interact
Perform actions on the page using @e refs.
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "click",
"ref": "@e1"
}'
Input Fields:
| Field | Type | Description |
|---|---|---|
action |
string | Action to perform (see Actions table) |
ref |
string | Element ref (e.g., @e1) |
text |
string | Text for fill/type/press/select |
direction |
string | Scroll direction: up, down, left, right |
scroll_amount |
int | Scroll pixels (default 400) |
wait_ms |
int | Wait duration in milliseconds |
url |
string | URL for goto action |
target_ref |
string | Target ref for drag action |
file_paths |
array | File paths for upload action |
Actions:
| Action | Required Fields | Description |
|---|---|---|
click |
ref |
Single click |
dblclick |
ref |
Double click |
fill |
ref, text |
Clear input and type text |
type |
text |
Type text without clearing |
press |
text |
Press key (Enter, Tab, Escape, etc.) |
select |
ref, text |
Select dropdown option by label |
hover |
ref |
Hover over element |
check |
ref |
Check checkbox |
uncheck |
ref |
Uncheck checkbox |
drag |
ref, target_ref |
Drag from ref to target_ref |
upload |
ref, file_paths |
Upload files to file input |
scroll |
direction |
Scroll page (optional: scroll_amount) |
back |
- | Go back in browser history |
wait |
wait_ms |
Wait for specified milliseconds |
goto |
url |
Navigate to different URL |
Output:
{
"success": true,
"action": "click",
"message": null,
"screenshot": "<File>",
"snapshot": {
"url": "...",
"title": "...",
"elements": [...],
"elements_text": "..."
}
}
screenshot
Take a screenshot of the current page.
infsh app run agent-browser --function screenshot --session $SESSION_ID --input '{
"full_page": true
}'
Input Fields:
| Field | Type | Default | Description |
|---|---|---|---|
full_page |
bool | false | Capture full scrollable page |
Output:
{
"screenshot": "<File>",
"width": 1280,
"height": 720
}
execute
Run JavaScript code on the page.
infsh app run agent-browser --function execute --session $SESSION_ID --input '{
"code": "document.title"
}'
Input Fields:
| Field | Type | Description |
|---|---|---|
code |
string | JavaScript code to execute |
Output:
{
"result": "Example Domain",
"error": null,
"screenshot": "<File>"
}
Examples:
# Get page title
'{"code": "document.title"}'
# Count elements
'{"code": "document.querySelectorAll(\"a\").length"}'
# Extract text
'{"code": "document.querySelector(\"h1\").textContent"}'
# Get all links
'{"code": "Array.from(document.querySelectorAll(\"a\")).map(a => a.href)"}'
# Scroll to bottom
'{"code": "window.scrollTo(0, document.body.scrollHeight)"}'
# Get computed style
'{"code": "getComputedStyle(document.body).backgroundColor"}'
close
Close the browser session. Returns video if recording was enabled.
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'
Output:
{
"success": true,
"video": "<File or null>"
}
Key Combinations
For the press action, use these key names:
| Key | Name |
|---|---|
| Enter | Enter |
| Tab | Tab |
| Escape | Escape |
| Backspace | Backspace |
| Delete | Delete |
| Arrow keys | ArrowUp, ArrowDown, ArrowLeft, ArrowRight |
| Modifiers | Control, Shift, Alt, Meta |
Key combinations:
# Ctrl+A (select all)
'{"action": "press", "text": "Control+a"}'
# Ctrl+C (copy)
'{"action": "press", "text": "Control+c"}'
# Shift+Tab (focus previous)
'{"action": "press", "text": "Shift+Tab"}'
Error Handling
When an action fails, success is false and message contains the error:
{
"success": false,
"action": "click",
"message": "Unknown ref: @e99. Run 'snapshot' to get current elements.",
"screenshot": "<File>",
"snapshot": {...}
}
Common errors:
Unknown ref: @eN- Ref doesn't exist, re-snapshot needed'text' required for fill action- Missing required field'target_ref' required for drag action- Missing drag targetTimeout 5000ms exceeded- Element not found or not clickable