6.5 KiB
Snapshot and Refs
Compact element references that reduce context usage for AI agents.
Related: commands.md for full function reference, SKILL.md for quick start.
Contents
- How Refs Work
- Snapshot Output Format
- Using Refs
- Ref Lifecycle
- Best Practices
- Ref Notation Details
- Troubleshooting
How Refs Work
Traditional approach:
Full DOM/HTML -> AI parses -> CSS selector -> Action (~3000-5000 tokens)
agent-browser approach:
Compact snapshot -> @refs assigned -> Direct interaction (~200-400 tokens)
The snapshot extracts interactive elements and assigns short @e refs, reducing token usage significantly.
Snapshot Output Format
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
Response elements_text:
@e1 [a] "Home" href="/"
@e2 [a] "Products" href="/products"
@e3 [a] "About" href="/about"
@e4 [button] "Sign In"
@e5 [input type="email"] placeholder="Email"
@e6 [input type="password"] placeholder="Password"
@e7 [button type="submit"] "Log In"
@e8 [input type="checkbox"] name="remember"
Response elements (structured):
[
{
"ref": "@e1",
"desc": "@e1 [a] \"Home\" href=\"/\"",
"tag": "a",
"text": "Home",
"role": null,
"name": null,
"href": "/",
"input_type": null
},
...
]
Using Refs
Once you have refs, interact directly:
# Click the "Sign In" button
'{"action": "click", "ref": "@e4"}'
# Fill email input
'{"action": "fill", "ref": "@e5", "text": "user@example.com"}'
# Fill password
'{"action": "fill", "ref": "@e6", "text": "password123"}'
# Submit the form
'{"action": "click", "ref": "@e7"}'
# Check the "remember me" checkbox
'{"action": "check", "ref": "@e8"}'
Ref Lifecycle
IMPORTANT: Refs are invalidated when the page changes!
# Get initial snapshot
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
# @e1 [button] "Next"
# Click triggers page change
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "click", "ref": "@e1"
}'
# MUST re-snapshot to get new refs!
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
# @e1 [h1] "Page 2" <- Different element now!
When to Re-snapshot
Always re-snapshot after:
- Navigation - Clicking links, form submissions,
gotoaction - Dynamic content - AJAX loads, modals opening, tabs switching
- Page mutations - JavaScript modifying the DOM
The interact function returns a fresh snapshot in its response, so you can often use that instead of a separate snapshot call.
Best Practices
1. Always Use the Latest Snapshot
# CORRECT: Use snapshot from previous response
RESULT=$(infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "click", "ref": "@e1"
}')
# Use elements from $RESULT.snapshot for next action
# WRONG: Using stale refs
# After navigation, @e1 may point to a completely different element
2. Check Success Before Continuing
RESULT=$(infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "click", "ref": "@e5"
}')
SUCCESS=$(echo $RESULT | jq -r '.success')
if [ "$SUCCESS" != "true" ]; then
echo "Click failed: $(echo $RESULT | jq -r '.message')"
# Re-snapshot and retry
fi
3. Use elements_text for Quick Decisions
For AI agents, elements_text provides a compact text representation:
@e1 [input type="email"] placeholder="Email"
@e2 [input type="password"] placeholder="Password"
@e3 [button] "Submit"
This is often enough to decide which element to interact with without parsing the full elements array.
Ref Notation Details
@e1 [tag type="value"] "text content" name="attr"
| | | | |
| | | | +- Additional attributes
| | | +- Visible text
| | +- Key attributes shown
| +- HTML tag name
+- Unique ref ID
Common Patterns
@e1 [button] "Submit" # Button with text
@e2 [input type="email"] # Email input
@e3 [input type="password"] # Password input
@e4 [a] "Link Text" href="/page" # Anchor link
@e5 [select] # Dropdown
@e6 [textarea] placeholder="Message" # Text area
@e7 [input type="file"] # File upload
@e8 [input type="checkbox"] checked # Checked checkbox
@e9 [input type="radio"] selected # Selected radio
@e10 [button type="submit"] "Send" # Submit button
Elements Captured
The snapshot captures these interactive elements:
- Links (
<a href>) - Buttons (
<button>,[role="button"]) - Inputs (
<input>,<textarea>,<select>) - Clickable elements (
[onclick],[tabindex]) - ARIA roles (
[role="link"],[role="checkbox"], etc.)
Non-interactive or hidden elements are filtered out.
Troubleshooting
"Unknown ref" Error
{
"success": false,
"message": "Unknown ref: @e15. Run 'snapshot' to get current elements."
}
Solution: Re-snapshot. The page changed and refs are stale.
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
# Now use the new refs
Element Not in Snapshot
The element you need might not appear because:
-
Not visible - Scroll to reveal it
'{"action": "scroll", "direction": "down", "scroll_amount": 500}' -
Not interactive - Use JavaScript to interact
'{"code": "document.querySelector(\".hidden-btn\").click()"}' -
In iframe - Currently not supported (use
executewith JS) -
Dynamic - Wait for it to load
'{"action": "wait", "wait_ms": 2000}'
Too Many Elements
Snapshots are limited to 50 elements. If the page has more:
- Scroll to bring relevant elements into view
- Use JavaScript to target specific elements
- Navigate to a more specific page
Ref Points to Wrong Element
If a ref seems to interact with the wrong element:
- Re-snapshot to get fresh refs
- Check if the page structure changed
- Verify with screenshot that the right element is targeted