beadboard/.agents/skills/agent-browser/references/snapshot-refs.md

251 lines
6.5 KiB
Markdown

# Snapshot and Refs
Compact element references that reduce context usage for AI agents.
**Related**: [commands.md](commands.md) for full function reference, [SKILL.md](../SKILL.md) for quick start.
## Contents
- [How Refs Work](#how-refs-work)
- [Snapshot Output Format](#snapshot-output-format)
- [Using Refs](#using-refs)
- [Ref Lifecycle](#ref-lifecycle)
- [Best Practices](#best-practices)
- [Ref Notation Details](#ref-notation-details)
- [Troubleshooting](#troubleshooting)
## How Refs Work
Traditional approach:
```
Full DOM/HTML -> AI parses -> CSS selector -> Action (~3000-5000 tokens)
```
agent-browser approach:
```
Compact snapshot -> @refs assigned -> Direct interaction (~200-400 tokens)
```
The snapshot extracts interactive elements and assigns short `@e` refs, reducing token usage significantly.
## Snapshot Output Format
```bash
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
```
**Response `elements_text`:**
```
@e1 [a] "Home" href="/"
@e2 [a] "Products" href="/products"
@e3 [a] "About" href="/about"
@e4 [button] "Sign In"
@e5 [input type="email"] placeholder="Email"
@e6 [input type="password"] placeholder="Password"
@e7 [button type="submit"] "Log In"
@e8 [input type="checkbox"] name="remember"
```
**Response `elements` (structured):**
```json
[
{
"ref": "@e1",
"desc": "@e1 [a] \"Home\" href=\"/\"",
"tag": "a",
"text": "Home",
"role": null,
"name": null,
"href": "/",
"input_type": null
},
...
]
```
## Using Refs
Once you have refs, interact directly:
```bash
# Click the "Sign In" button
'{"action": "click", "ref": "@e4"}'
# Fill email input
'{"action": "fill", "ref": "@e5", "text": "user@example.com"}'
# Fill password
'{"action": "fill", "ref": "@e6", "text": "password123"}'
# Submit the form
'{"action": "click", "ref": "@e7"}'
# Check the "remember me" checkbox
'{"action": "check", "ref": "@e8"}'
```
## Ref Lifecycle
**IMPORTANT**: Refs are invalidated when the page changes!
```bash
# Get initial snapshot
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
# @e1 [button] "Next"
# Click triggers page change
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "click", "ref": "@e1"
}'
# MUST re-snapshot to get new refs!
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
# @e1 [h1] "Page 2" <- Different element now!
```
### When to Re-snapshot
Always re-snapshot after:
1. **Navigation** - Clicking links, form submissions, `goto` action
2. **Dynamic content** - AJAX loads, modals opening, tabs switching
3. **Page mutations** - JavaScript modifying the DOM
The `interact` function returns a fresh snapshot in its response, so you can often use that instead of a separate snapshot call.
## Best Practices
### 1. Always Use the Latest Snapshot
```bash
# CORRECT: Use snapshot from previous response
RESULT=$(infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "click", "ref": "@e1"
}')
# Use elements from $RESULT.snapshot for next action
# WRONG: Using stale refs
# After navigation, @e1 may point to a completely different element
```
### 2. Check Success Before Continuing
```bash
RESULT=$(infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "click", "ref": "@e5"
}')
SUCCESS=$(echo $RESULT | jq -r '.success')
if [ "$SUCCESS" != "true" ]; then
echo "Click failed: $(echo $RESULT | jq -r '.message')"
# Re-snapshot and retry
fi
```
### 3. Use elements_text for Quick Decisions
For AI agents, `elements_text` provides a compact text representation:
```
@e1 [input type="email"] placeholder="Email"
@e2 [input type="password"] placeholder="Password"
@e3 [button] "Submit"
```
This is often enough to decide which element to interact with without parsing the full `elements` array.
## Ref Notation Details
```
@e1 [tag type="value"] "text content" name="attr"
| | | | |
| | | | +- Additional attributes
| | | +- Visible text
| | +- Key attributes shown
| +- HTML tag name
+- Unique ref ID
```
### Common Patterns
```
@e1 [button] "Submit" # Button with text
@e2 [input type="email"] # Email input
@e3 [input type="password"] # Password input
@e4 [a] "Link Text" href="/page" # Anchor link
@e5 [select] # Dropdown
@e6 [textarea] placeholder="Message" # Text area
@e7 [input type="file"] # File upload
@e8 [input type="checkbox"] checked # Checked checkbox
@e9 [input type="radio"] selected # Selected radio
@e10 [button type="submit"] "Send" # Submit button
```
### Elements Captured
The snapshot captures these interactive elements:
- Links (`<a href>`)
- Buttons (`<button>`, `[role="button"]`)
- Inputs (`<input>`, `<textarea>`, `<select>`)
- Clickable elements (`[onclick]`, `[tabindex]`)
- ARIA roles (`[role="link"]`, `[role="checkbox"]`, etc.)
Non-interactive or hidden elements are filtered out.
## Troubleshooting
### "Unknown ref" Error
```json
{
"success": false,
"message": "Unknown ref: @e15. Run 'snapshot' to get current elements."
}
```
**Solution**: Re-snapshot. The page changed and refs are stale.
```bash
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
# Now use the new refs
```
### Element Not in Snapshot
The element you need might not appear because:
1. **Not visible** - Scroll to reveal it
```bash
'{"action": "scroll", "direction": "down", "scroll_amount": 500}'
```
2. **Not interactive** - Use JavaScript to interact
```bash
'{"code": "document.querySelector(\".hidden-btn\").click()"}'
```
3. **In iframe** - Currently not supported (use `execute` with JS)
4. **Dynamic** - Wait for it to load
```bash
'{"action": "wait", "wait_ms": 2000}'
```
### Too Many Elements
Snapshots are limited to 50 elements. If the page has more:
1. **Scroll** to bring relevant elements into view
2. **Use JavaScript** to target specific elements
3. **Navigate** to a more specific page
### Ref Points to Wrong Element
If a ref seems to interact with the wrong element:
1. Re-snapshot to get fresh refs
2. Check if the page structure changed
3. Verify with screenshot that the right element is targeted