Agent tools reference

Every agentic skill (one with a markdown entrypoint) runs an Anthropic Messages loop with this set of built-in tools available, plus any user tools the skill declares under tools: and the auto-injected set_output tool when an output_schema is set.

The built-ins are platform-provided — you don't declare them in skill.yaml. Only bash can be turned off (via disable_bash: true); the rest are always on.

For the per-model input schemas, see media-models-reference. For the deterministic-skill equivalents (puras.media.run, etc.), see sdk-runtime-reference.

Index

Tool	Kind
`bash`	shell
`todo_write`	planning
`generate_image`	generate
`generate_video`	generate
`generate_audio`	generate
`transcribe`	audio → text
`web_search`	web
`image_search`	web
`web_fetch`	web
`send_email`	email
`web_screenshot`	web → drive
`file_read`	file → context
`file_write`	context → file
`file_edit`	file → file
`download_url`	url → drive
`drive_url`	drive → url
`drive_pull`	drive → file
`describe_subagent`	subagent
`run_subagent`	subagent
`memory_search`	memory
`memory_get`	memory
`memory_put`	memory
`memory_forget`	memory
`set_output`	lifecycle

Built-in tools

`bash`

Run a shell command in the job's working directory. The current dir contains a drive/ folder for files that should persist across jobs (synced to the workspace drive). Anything written elsewhere is ephemeral. Returns combined stdout+stderr (last 8KB) and the exit code.

Inputs

Field	Type	Required	Default	Notes
`command`	string	✓	—	Shell command to execute
`timeout`	integer		—	Max seconds (default 60, hard ceiling 600)

Environment

cwd = the job's working directory. A drive/ subdirectory is synced to project storage — anything written there persists across jobs; everything else is ephemeral.
$PURAS_DEPLOYMENT_ROOT points at the deployment bundle.
$PYTHONPATH includes the deployment root and the workdir, so python -c "import <your_module>" works against bundled code.
Project secrets are injected as env vars ().
The skill's venv bin/ is on $PATH, so installed CLIs work.

Output

Combined stdout+stderr, last 8KB only. Use redirection (> drive/log.txt) for larger output.

Disable

Set disable_bash: true in skill.yaml to remove bash from the agent's tool list — useful for skills that should only call user-defined tools.

`todo_write`

Record or update your working plan as an ordered checklist. Pass the FULL list every time — it REPLACES the previous one (this tool keeps no state of its own). Reserve it for genuinely complex jobs — many steps, branching or non-linear paths, or several subagents to coordinate — where keeping a live checklist actually keeps you on course. When you use it, keep it live: mark a step in_progress right before you begin it and completed the moment it's done, with exactly one step in_progress at a time. The list is surfaced to the user as a live progress checklist; it does not run anything itself. Do NOT use it for short, linear jobs — a handful of sequential steps you can just carry out: each call costs tokens and latency, so a simple three- or four-step task should be executed directly, not planned.

Inputs

Field	Type	Required	Default	Notes
`todos`	array<object>	✓	—	The complete, ordered todo list. Replaces any prior list, so always send every item, not just the changed ones

Stateless: the tool keeps no list of its own — you send the FULL checklist every call and it replaces the previous one. Nothing is executed; the items are surfaced to the user as a live progress checklist. Mark exactly one step in_progress at a time and flip it to completed as soon as it's done. Reserve it for genuinely complex work — many steps, branching paths, or several subagents to coordinate; don't reach for it on short, linear jobs (a simple three- or four-step task should just be executed, since each call costs tokens and latency).

`generate_image`

Generate an image from a prompt. Pass refs (drive paths or image URLs) to edit/compose from references instead of plain text-to-image. Prefer this over the raw media tool. Returns the drive drive_path where the file was saved — pass that path straight into another generate_* call (as refs/image) without converting it. Give EXACTLY ONE of prompt (inline) or prompt_path (a drive text file you wrote with file_write). You don't pick the model — it's selected automatically from the inputs you pass (which fix the task) and the skill's configured default, overridable per run. Express what you need through the inputs, not a model name. Any image/video/audio reference can be a drive path (e.g. an input file, or the drive_path a previous step returned) OR an https/data URL — the platform resolves drive paths to fetchable URLs (and resizes images to a model's minimum) automatically, so pass the path straight in; there's no need to call drive_url first. To bind part of the prompt to a specific reference, write @Image1, @Image2, … (1-indexed, in the SAME order as refs) — e.g. "@Image1 is the product the person in @Image2 holds". This is portable: models that read the tags natively (Seedance and Kling reference-to-video) get them verbatim, and every other model (Veo, and all image models) has them rewritten to prose ("the first reference image") automatically — so write @ImageN once and it works on whichever model is selected. Don't use @ElementN (not wired) — always @ImageN.

Inputs

Field	Type	Default	Notes
`aspect_ratio`	string	—	e.g. '1:1', '16:9', '9:16'
`n`	integer	—	Number of images (default 1)
`output_path`	string	—	Optional drive subpath (e.g. 'shots/hero.png')
`prompt`	string	—	What to draw, or how to edit the refs. HARD LIMIT ~2500 characters: an over-limit prompt is REJECTED with an error before anything is rendered or billed — it is NOT truncated, so keep the prompt under ~2500 (aim under ~1900) and put the must-render elements first. Point at a specific reference with `@Image1`/`@Image2` (in `refs` order). For a long prompt, prefer `prompt_path` so you don't re-emit the text here
`prompt_path`	string	—	Drive path of a UTF-8 text file whose contents become the prompt (exactly one of `prompt`/`prompt_path`). Token-saver for long prompts: `file_write` the prompt once — the result reports its exact `chars`, so you verify the budget for free — then pass the path here instead of pasting the text again. Trim with `file_edit` if over budget. The same char cap applies to the file's contents
`refs`	array<string>	—	Reference images (drive paths or URLs) → edit/compose mode. Omit for text-to-image. Address them in the prompt as `@Image1`, `@Image2`, … in this order
`resolution`	string	—	'1K'

The image is saved to the project drive and its drive_path is returned — pass that path straight into a later generate_*/media call (the platform resolves drive paths to fetchable URLs and resizes images to a model's minimum). Pass refs to edit/compose from reference images instead of text-to-image; the model is auto-selected (see the note above).

See also sdk-runtime-reference — the deterministic-skill equivalent, puras.media.generate_image(...).

`generate_video`

Generate a video. The inputs you pass fix the task: lipsync_audio (+image) → lip-synced talking head; refs → reference-to-video; image → image-to-video; neither → text-to-video. Prefer this over the raw media tool. Returns the drive drive_path. You don't pick the model — it's selected automatically from the inputs you pass (which fix the task) and the skill's configured default, overridable per run. Express what you need through the inputs, not a model name. Any image/video/audio reference can be a drive path (e.g. an input file, or the drive_path a previous step returned) OR an https/data URL — the platform resolves drive paths to fetchable URLs (and resizes images to a model's minimum) automatically, so pass the path straight in; there's no need to call drive_url first. To bind part of the prompt to a specific reference, write @Image1, @Image2, … (1-indexed, in the SAME order as refs) — e.g. "@Image1 is the product the person in @Image2 holds". This is portable: models that read the tags natively (Seedance and Kling reference-to-video) get them verbatim, and every other model (Veo, and all image models) has them rewritten to prose ("the first reference image") automatically — so write @ImageN once and it works on whichever model is selected. Don't use @ElementN (not wired) — always @ImageN.

Inputs

Field	Type	Default	Notes
`aspect_ratio`	string	—	e.g. '16:9', '9:16', '1:1'
`audio`	boolean	—	Generate native audio (t2v/i2v/r2v) — dialogue, SFX and ambience in the same pass. Every family (Seedance / Kling / Veo) supports it; put any spoken line in the prompt (`Speaker says: "…"`). Default false
`duration`	integer	—	Seconds. Snapped to the model's allowed set (e.g. Veo 4/6/8) with a warning. Ignored for lip-sync
`image`	string	—	A still to animate (i2v) or the portrait (lip-sync). Drive path or URL
`last_frame`	string	—	Optional still (drive path or URL) the clip should END on (tail keyframe). Pair with `image` for a first→last keyframe interpolation, or use alone to land a t2v/r2v clip on a set frame (e.g. an end card, or a gameplay first frame so the clip cuts cleanly into real footage). Seedance / Kling only; ignored with a warning on other families and on lip-sync
`lipsync_audio`	string	—	Narration audio (drive path or URL) → lip-sync mode (talking head). Only for a verbatim read; for a creator/skit speaking, put the line in the prompt with `audio: true` instead
`output_path`	string	—	Optional drive subpath (e.g. 'renders/clip.mp4')
`prompt`	string	—	The scene/action (a short delivery note for lip-sync). Multi-shot: label beats `Shot 1:` / `Cut to:` — understood across models; Seedance also follows timed cues like `Cut 1 (0.0-0.7s) …`. Spoken dialogue is generated NATIVELY when `audio: true` — write it as `Speaker says: "<line>"` (a colon, NOT quotes around the speaker) and append `(no subtitles, no on-screen text)`, since models trained on subtitled clips otherwise stamp garbled captions. Keep on-screen words OUT of the prompt (models garble >2-3 rendered words) — render copy as a caption/still afterward instead. Reference a specific image with `@Image1`/`@Image2`. For a long prompt, prefer `prompt_path`
`prompt_path`	string	—	Drive path of a UTF-8 text file whose contents become the prompt (give at most one of `prompt`/`prompt_path`). Token-saver for long prompts: `file_write` the prompt once (its result reports exact `chars`), then pass the path here instead of pasting the text again
`refs`	array<string>	—	Reference images (drive paths or URLs) → reference-to-video (keep referenced subjects on-model). Address them in the prompt as `@Image1`, `@Image2`, … in this order
`resolution`	string	—	e.g. '720p', '1080p' (where supported)

The inputs you pass fix the task: lipsync_audio→lip-sync, refs→reference-to-video, image→image-to-video, else text-to-video. The clip is saved to the drive and its drive_path returned. A model that can't do the inferred task errors clearly.

See also sdk-runtime-reference — puras.media.generate_video(...).

`generate_audio`

Generate speech from text (text-to-speech). Prefer this over the raw media tool. Returns the drive drive_path of the audio file. The default model is ElevenLabs v3, which reads inline AUDIO TAGS — wrap a delivery cue in square brackets and v3 acts it out without speaking it: '[excited] It's finally here! [whispers] Don't tell anyone.' Common tags: [excited] [sad] [angry] [whispers] [laughs] [sighs] [sarcastic] [British accent]. (To transcribe speech→text, use the raw media tool with a speech-to-text model instead.) You don't pick the model — it's selected automatically from the inputs you pass (which fix the task) and the skill's configured default, overridable per run. Express what you need through the inputs, not a model name.

Inputs

Field	Type	Required	Default	Notes
`text`	string	✓	—	The words to speak (billed per character). May contain v3 audio tags in [brackets]
`language`	string		—	ISO 639-1 code (e.g. 'en', 'tr'); omit to auto-detect
`output_path`	string		—	Optional drive subpath (e.g. 'audio/vo.mp3')
`similarity_boost`	number		—	Delivery: similarity boost. v2-only (ignored on the default v3 model)
`speed`	number		—	Delivery: speaking speed. v2-only (ignored on the default v3 model)
`stability`	number		—	Delivery (0–1): lower = more expressive and tag-responsive, higher = steadier. The primary v3 knob
`style`	number		—	Delivery: style exaggeration. v2-only (ignored on the default v3 model)
`voice`	string		—	A voice name/persona (e.g. 'Aria', 'Roger') or a voice id

Text-to-speech. The audio is saved to the drive and its drive_path returned.

See also sdk-runtime-reference — puras.media.generate_audio(...).

`transcribe`

Transcribe speech to text. Pass audio (a drive path, or an https/data URL); returns the transcript inline as text plus words (each with start/end seconds) — there is NO file. Optionally pass keyterms (brand / proper-noun spellings) to bias the transcription, and a language hint. The platform resolves a drive path to a fetchable URL automatically. (To go the other way — text → speech — use generate_audio.)

Inputs

Field	Type	Required	Default	Notes
`audio`	string	✓	—	Audio to transcribe — a drive path or an https/data URL
`keyterms`	array<string>		—	Optional brand / proper-noun spellings to bias transcription
`language`	string		—	Optional ISO 639-1 hint (e.g. 'en', 'tr'); omit to auto-detect

Speech-to-text — the reverse of generate_audio. Returns the transcript inline as text + words (each with start/end seconds); there is no file.

See also sdk-runtime-reference — puras.media.transcribe(...).

`web_search`

Search the web via the platform's search provider. Returns a list of results with title, url, and a short snippet.

Inputs

Field	Type	Required	Default	Notes
`query`	string	✓	—	Search query
`max_results`	integer		`5`	Max results to return (1-20, default 5)

Returns a list of {title, url, snippet}. To then load a result's full text, follow up with web_fetch.

`image_search`

Search for images on the web via the platform's search provider. Returns image URLs, thumbnails, dimensions, and source pages.

Inputs

Field	Type	Required	Default	Notes
`query`	string	✓	—	Image search query
`max_results`	integer		`5`	Max results to return (1-20, default 5)

Returns image URLs, thumbnails, dimensions, and source pages. To actually look at one of the images, follow up with download_url + file_read — for the canonical search → download → look pattern.

`web_fetch`

Fetch a web page (HTTP GET) and return its plain text content with scripts/styles stripped. By default JavaScript is NOT executed, so a client-rendered SPA comes back mostly empty — set render_js: true to render the page in a headless browser first and return the text the user would actually see (slower, but works for JS apps). Returns the final URL (after redirects), page title, and extracted text (truncated to max_chars).

Inputs

Field	Type	Required	Default	Notes
`url`	string	✓	—	The URL to fetch (http:// or https://)
`max_chars`	integer		`20000`	Max chars of body text to return (500-200000, default 20000)
`render_js`	boolean		`false`	Render the page in a headless browser (Chromium) so its JavaScript runs before the text is extracted. Use for single-page apps / sites that render client-side. Default false (plain HTTP fetch)

By default does not execute JavaScript, so SPAs that render client-side come back mostly empty — pass render_js: true to render the page in a headless browser (Chromium) first and return the text the user would actually see (slower). Returns the final URL (after redirects), the page title, and extracted text truncated to max_chars.

`send_email`

Send an email. The body is written in MARKDOWN — headings, bold, lists, links, tables and code fences are rendered to HTML for the recipient automatically. A standard footer is appended for you (it credits the sending workspace, skillpack and skill and links back to the skill), so DON'T write your own signature/footer. Send to one or more recipients — each gets their OWN copy, so addresses are never exposed to each other. Returns ok plus the recipients the message was accepted for. Write a clear, specific subject.

Inputs

Field	Type	Required	Default	Notes
`body`	string	✓	—	Email body in Markdown. It's rendered to HTML and a credit footer is appended automatically — do not add your own
`subject`	string	✓	—	The subject line
`to`	array<string>	✓	—	Recipient email address(es). Each address receives a separate copy (no shared To/CC). (min 1 items, max 20 items)
`reply_to`	string		—	Optional Reply-To address so a recipient's reply threads back to this inbox instead of the platform's from-address

Sends an email via the platform's transactional mail provider (SendGrid). The body is authored in Markdown and rendered to HTML for the recipient (headings, bold, lists, links, tables, code fences). A standard credit footer is appended automatically — it names the sending workspace, skillpack and skill and links back to the skill — so the agent should not write its own signature.

Each address in to receives its own copy (a separate personalization), so recipients are never disclosed to each other. Pass reply_to to thread replies to a chosen inbox. Returns ok and the recipients the message was accepted for. Hosted-only (disabled on local runs).

`web_screenshot`

Render a page in a headless browser (Chromium) and capture a PNG screenshot, saved to the workspace drive. Point it at EITHER a live url OR a path to an HTML file already in the drive (e.g. a single-file playable you just built) — exactly one. JavaScript runs before the shot, so this is how you see what a client-rendered page or an HTML5 game actually looks like. The screenshot is also attached to the conversation so you can inspect it directly (on vision models), and any console/page errors are reported — useful for validating that a built HTML document renders without crashing. Returns the saved drive_path (sign it with drive_url only if you need a URL), the page title, and any console_errors.

Inputs

Field	Type	Default	Notes
`full_page`	boolean	`false`	Capture the full scrollable page instead of just the viewport. Default false. Avoid it on LONG pages: a tall capture can exceed the model's per-side pixel limit, in which case the image is saved to the drive but NOT attached for you to view. To inspect a long page, leave this false and use `scroll_y` to grab specific sections as single viewport shots
`output_path`	string	—	Drive path to save the PNG to (e.g. '_jobs/<job_id>/preview.png'). Defaults to a generated path under 'screenshots/'. Must end in '.png'
`path`	string	—	Drive path to a local HTML file to render (e.g. '_jobs/<job_id>/playable.html'). A leading 'drive/' is accepted and stripped. Mutually exclusive with `url`
`scroll_y`	integer	`0`	Scroll the page down this many pixels before capturing a single viewport (default 0 = top). Use this to inspect a section further down a long page without a giant full_page capture — e.g. scroll_y: 1400 to see the next screenful
`url`	string	—	A live page to render (http:// or https://). Mutually exclusive with `path`
`viewport_height`	integer	`800`	Viewport height in px (64-3840, default 800)
`viewport_width`	integer	`1280`	Viewport width in px (64-3840, default 1280)
`wait_ms`	integer	`1200`	Extra milliseconds to wait after load before capturing, so animations/first frame settle (0-15000, default 1200)

Renders a live url or a drive HTML path (e.g. a single-file playable you just built) in a headless browser and saves a PNG to the drive. JavaScript runs before the capture, so this shows what a client-rendered page or an HTML5 game really looks like. The image is attached to the conversation (on vision models) and any console_errors are reported — handy for runtime-validating that a built HTML document renders without crashing.

Constraints

Exactly one of url / path.
Output is a PNG; output_path must end in .png (auto-appended if omitted).
Viewport-only by default; set full_page: true for the whole scrollable page.

`file_read`

Read one or more files from the workspace drive and attach them to the conversation. Images (jpg/png/gif/webp) and PDFs come back as vision/document blocks you can look at directly. Text files (code, markdown, JSON, CSV, etc.) are inlined as text. Use this when you need to actually inspect contents — for listing files, use bash ls drive/ instead. Hard cap: 5MB per file, 10 files per call.

Inputs

Field	Type	Required	Default	Notes
`paths`	array<string>	✓	—	Drive paths relative to the workspace drive root (e.g. ['uploads/photo.jpg', 'data/report.pdf']). A leading 'drive/' is accepted and stripped. (min 1 items, max 10 items)

Returns a block list — one labeled header per file, then the content. Images/PDFs become vision/document blocks the model looks at directly; text files are inlined as text.

Constraints

Drive paths only. A leading drive/ is accepted and stripped. For arbitrary URLs, the agent should download_url first, then file_read.
Hard caps: 5 MB per file, 10 paths per call, text inlined up to 100k chars then truncated.
On non-vision models, image/document files in the path list are skipped with an error in the result; text files still come through.

File inputs can also arrive via inputs.attachments at submit time.

`file_write`

Create a new text file in the workspace drive, or completely overwrite an existing one, with the exact content you provide. Parent directories are created automatically. Use this to author a file from scratch (source code, an HTML scaffold copy, a spec/plan note, JSON/config) or to replace one wholesale. For a SURGICAL change to part of an existing file, prefer file_edit — it patches just the target region instead of you re-sending the whole file, so it's faster and can't clobber unrelated lines. Text/UTF-8 only; binary or media files come from generate_*/download_url, not this. The write is persisted to the drive, so file_read, other tools, and later steps see it immediately. Returns the drive path, byte size, exact character count (chars — use it to check a budget-capped media prompt without re-emitting the text; see prompt_path on generate_image/video), and line count.

Inputs

Field	Type	Required	Default	Notes
`content`	string	✓	—	Full file content to write, verbatim
`path`	string	✓	—	Drive path to write (e.g. '_jobs/<job_id>/game.js'). A leading 'drive/' is accepted and stripped; '..' segments are rejected

Creates or fully overwrites a drive text file with the exact content given; parent dirs are made automatically and the file is pushed to the drive so later steps and file_read see it.

When to reach for it

Authoring a file from scratch (source code, an HTML scaffold copy, a spec/plan note, JSON) or replacing one wholesale.
For a surgical change to part of an existing file, use file_edit instead — it won't clobber unrelated lines and you don't re-send the whole file.

Constraints

Drive paths only; a leading drive/ is accepted and stripped, .. segments are rejected.
Text/UTF-8 only — binary/media come from generate_* / download_url.

Returns the drive path, byte size, and line count.

`file_edit`

Make a targeted, atomic edit to an existing text file in the workspace drive — the agentic way to evolve code in place without rewriting the whole file. Two modes: • STRING mode (preferred, robust): give old_string + new_string. old_string must match the file EXACTLY — including whitespace and indentation — and must be UNIQUE in the file, or the edit is rejected so you never patch the wrong place. Add surrounding context to make a short anchor unique, or pass replace_all: true to change every occurrence (e.g. renaming a symbol). • LINE mode: give start_line + end_line (1-based, inclusive) and new_string to replace exactly that line range; an empty new_string deletes the range. Use when a clean unique anchor is awkward. Returns a unified diff (with line numbers) of what changed, plus the new size and line count. The change is persisted to the drive. To create a file or replace it entirely, use file_write instead. Tip: read the file first (file_read, or bash 'grep -n …' to find line numbers) so your anchor is exact and unique.

Inputs

Field	Type	Required	Default	Notes
`new_string`	string	✓	—	Replacement text. Required. In LINE mode this is the text that replaces the line range (empty string deletes it)
`path`	string	✓	—	Drive path to edit (e.g. '_jobs/<job_id>/game.js'). A leading 'drive/' is accepted and stripped
`end_line`	integer		—	LINE mode: 1-based last line to replace, inclusive
`old_string`	string		—	STRING mode: exact text to find and replace. Must be unique in the file unless `replace_all` is true. Omit when using LINE mode
`replace_all`	boolean		—	STRING mode: replace EVERY occurrence of `old_string` instead of requiring a unique match (default false)
`start_line`	integer		—	LINE mode: 1-based first line to replace (use with `end_line`, instead of `old_string`)

Atomic, surgical in-place edit of a drive text file — the agentic way to evolve code (e.g. building a game piece by piece) without re-sending the whole file.

Two modes

String (preferred): old_string → new_string. The anchor must match exactly (whitespace included) and be unique, or the edit is refused so you never patch the wrong place. replace_all: true changes every occurrence (e.g. a rename).
Line: start_line+end_line (1-based, inclusive) + new_string replaces that range; an empty new_string deletes it. Use when a clean anchor is awkward.

Returns a unified diff (with line-number hunks) of the change, plus the new size and line count. Tip: read the file first (file_read, or bash 'grep -n …') so the anchor is exact. To create or replace a whole file, use file_write.

`download_url`

Download a file from a URL via plain HTTP GET and save it to the workspace drive. Use this for images, PDFs, CSVs, etc. Returns the saved drive_path (sign it with drive_url only if you need a URL). Does NOT resolve share links (Google Drive/Dropbox/YouTube) — only direct HTTP(S) URLs work. Hard cap: 50MB per file.

Inputs

Field	Type	Required	Default	Notes
`path`	string	✓	—	Where to save in the drive. Either a full path with filename ('data/report.pdf') or a directory ending in '/' ('downloads/') — in the latter case the filename is inferred from the URL
`url`	string	✓	—	Direct file URL (http:// or https://)

Returns the resolved drive_path. The file is in the drive — mint a URL with drive_url only if you need one.

Constraints

Plain HTTP(S) only. Share links (Google Drive, Dropbox, YouTube) are not resolved.
50 MB hard cap per file.
If path ends in /, the filename is inferred from the URL's last segment.

`drive_url`

Mint a fresh, time-limited public URL for a file already in the workspace drive. Use this to hand a drive file to a tool that needs a real URL — most importantly the media tool, whose image/video inputs (e.g. 'image_url', 'image_urls') must be URLs, not drive paths. Given a drive-relative path like 'uploads/logo.png', returns a signed https:// URL anyone can fetch until it expires. Free. (To go the other way — save a URL into the drive — use download_url.)

Inputs

Field	Type	Required	Default	Notes
`path`	string	✓	—	Drive path relative to the workspace drive root (e.g. 'uploads/logo.png'). A leading 'drive/' is accepted and stripped
`ttl`	integer		—	Seconds until the URL expires (60-86400, default 3600)

Returns {url, path, expires_in}. The url is a signed https:// link the chosen TTL keeps alive (default 1h).

When to reach for it

Feeding a drive file to the media tool — its image_url / image_urls (and video/audio refs) must be URLs, not drive paths. Sign first, then pass the URL into media.
Any other upstream that fetches by URL.

Free. The inverse direction (save a URL into the drive) is download_url.

`drive_pull`

Download a file from the workspace drive onto local disk so you can read it with bash (cat, ffmpeg, a script). file_read, generate_*, and subagent inputs fetch drive files automatically — you only need drive_pull to reach a drive file from RAW bash that you did NOT create this run (e.g. an asset an earlier job saved). After it returns ok, the file is at drive/<path>. Files you produce this run (generate_*, download_url, tool outputs) are already local — no pull needed.

Inputs

Field	Type	Required	Default	Notes
`path`	string	✓	—	Drive path relative to the workspace drive root (e.g. 'research/notes.md'). A leading 'drive/' is accepted and stripped

Only needed to reach a drive file from raw bash that you did NOT create this run (e.g. an asset an earlier job saved). After it returns, the file is at drive/<path>.

file_read, generate_*, and subagent inputs fetch drive files automatically, and anything you produce this run (generate_*, download_url, tool outputs) is already local — no pull needed.

`describe_subagent`

Look up the EXACT inputs a subagent target expects — its declared input schema (field names, types, which are required) and its output shape. ALWAYS call this before run_subagent for any skill target: run_subagent does NOT show you the target's schema, so without this you'd be guessing field names and a wrong shape fails validation and wastes a turn. target takes the same refs as run_subagent — a references/*.md bundle path, skill, skillpack/skill, or workspace/skillpack/skill. A .md / inline-prompt target has no declared schema (free-form inputs) and this will tell you so.

Inputs

Field	Type	Required	Default	Notes
`target`	string	✓	—	The subagent to describe: a `references/*.md` bundle path or a skill ref (`skill`, `skillpack/skill`, `workspace/skillpack/skill`)

Returns a target's declared input schema (field names, types, which are required) and its output shape. Always call this before run_subagent for a skill target — run_subagent doesn't surface the target's schema, so without this you're guessing field names and a wrong shape fails validation and wastes a turn. A .md / inline target has no declared schema (free-form inputs) and this says so.

`run_subagent`

Run an isolated subagent and get its result back. A subagent run is its own job (own context window, own cost, linked to this run as a child), so use it to hand a self-contained stage of work to a fresh agent instead of doing everything in this context.

BEFORE calling this with a skill target, call describe_subagent first to get the target's exact input schema, then pass inputs in that shape — this tool does not surface the target's schema, so guessing field names fails validation.

Give EITHER target or prompt:

target runs an existing skill or bundle prompt file:

references/foo.md (any bundle-relative *.md path) — runs that markdown file as the system prompt of a fresh subagent, in THIS skillpack. The path is relative to your own skill directory. Use this for pipeline stages whose prompt lives in your references/.
skill — another skill in this skillpack.
skillpack/skill — a skill in another skillpack in this workspace.
workspace/skillpack/skill — a public skill in another workspace.

prompt runs a one-off inline system prompt as a subagent in this bundle's context, with the built-in tools and a free-form set_output. Use this for a quick self-contained stage you don't want to spell out as a file or skill.

inputs is passed to the child verbatim (it reads them as its own inputs). For a .md or inline-prompt subagent there's no declared schema, so pass whatever that prompt expects — any file-shaped values (URLs, drive paths) are staged and attached for the child to look at.

Inputs

Field	Type	Default	Notes
`inputs`	object	—	Inputs passed to the child verbatim, as a JSON object (NOT a JSON string). For a skill `target`, match the shape from `describe_subagent`. For a `.md` / inline-prompt target there's no declared schema — pass whatever that prompt expects; file-shaped values (URLs, drive paths) are staged and attached for the child to look at
`prompt`	string	—	An inline system prompt to run as a one-off subagent in this bundle's context. Mutually exclusive with `target`
`target`	string	—	What to run: a `references/*.md` bundle path (isolated subagent in this skillpack), or a skill ref (`skill`, `skillpack/skill`, `workspace/skillpack/skill`). Mutually exclusive with `prompt`
`timeout`	integer	—	Seconds to wait for the child (default and ceiling 1800; higher values are clamped)
`version`	integer	—	Pin a skill `target` to a specific deployment version of its skillpack (e.g. 3). Omit to use the active deployment. Only valid with a skill-ref `target` — not with a `.md` path or `prompt`, which run against your own deployment

Runs an isolated subagent — its own job, context window, and cost, linked to this run as a child — so you can hand a self-contained stage to a fresh agent. Give either target (an existing skill skill / skillpack/skill / workspace/skillpack/skill, or a bundle-relative references/*.md prompt file) or prompt (a one-off inline system prompt). For a skill target, call describe_subagent first and pass inputs in that shape.

`memory_search`

Look up existing workspace memory BEFORE doing expensive research. Returns matching records (best first — relevance is fused from exact identity, text and semantic similarity, then shaped by recency and importance) with their record payload and a fresh flag — a fresh hit means you can REUSE the record and skip re-researching. Match by kind + key (an entity_key/alias) and/or content_hash, and/or a free-text query over titles, summaries and tags. Workspace MEMORY is a shared brain across every skill in this workspace: a product/subject researched once is reused by later jobs (any skill) instead of re-researched. Store only STABLE facts that outlive this job — an entity's appearance, brand colors/logo, audience, tone; a durable user preference; or a recorded decision. NEVER store per-job creative choices (concept, shot plan, layout, headline, script, prompt text, chosen aspect ratios). Identity hints (entity_key + content_hash) for this job are provided in the first message — key your writes with them so the next run matches.

Inputs

Field	Type	Default	Notes
`content_hash`	string	—	The job's inputs fingerprint (the `content_hash` identity hint). When given, a hit is only `fresh` if the stored hash matches — i.e. the inputs haven't changed
`key`	string	—	An entity_key or alias to match exactly — e.g. the `entity_key` identity hint from the first message, a normalized URL, or a product slug
`kind`	string	—	Record kind to search, e.g. 'product_brief', 'game_brief', 'brand_kit', 'store_listing', 'research', 'user_preference'
`limit`	integer	—	Max results (default 8)
`mtype`	enum	—	Restrict to a memory type. Omit to search all. Values: `"semantic"` \| `"episodic"` \| `"procedural"`
`query`	string	—	Free-text search over titles, summaries, kinds and tags (plus semantic similarity when available). Use a few descriptive words, e.g. 'coffee mug brand colors'
`scope`	enum	—	Restrict to a sharing scope. Omit to search all. Values: `"entity"` \| `"skillpack"` \| `"workspace"`

Check the workspace memory — a shared brain across every skill in the workspace — before expensive research. Returns matching records (most relevant first) with a fresh flag; a fresh hit means you can reuse the record and skip re-researching. Search is hybrid: exact kind + key (entity_key/alias) and/or content_hash matches, plus full-text (and, when configured, semantic-similarity) matching on query, fused and ranked by relevance × recency × importance.

Identity hints (entity_key + content_hash) for the current job are provided in the first message — match on those.

`memory_get`

Fetch one memory record in full by its id (from a memory_search hit or an injected memory_id). Use when a search preview was truncated and you need the complete record.

Inputs

Field	Type	Required	Default	Notes
`id`	string	✓	—	The memory record id (uuid)

Fetch one record in full by id (from a memory_search hit or an injected memory_id) — use it when a search preview was truncated and you need the complete record.

`memory_put`

Save (or update) a record in workspace memory so future jobs reuse it. A keyed record (semantic/procedural with an entity_key) is UPSERTED — calling it again for the same kind+entity_key overwrites and bumps the version, so just write the current best version wholesale. Episodic records (a logged event) append. Workspace MEMORY is a shared brain across every skill in this workspace: a product/subject researched once is reused by later jobs (any skill) instead of re-researched. Store only STABLE facts that outlive this job — an entity's appearance, brand colors/logo, audience, tone; a durable user preference; or a recorded decision. NEVER store per-job creative choices (concept, shot plan, layout, headline, script, prompt text, chosen aspect ratios). Identity hints (entity_key + content_hash) for this job are provided in the first message — key your writes with them so the next run matches.

Inputs

Field	Type	Required	Default	Notes
`kind`	string	✓	—	What this record is: 'product_brief', 'game_brief', 'brand_kit', 'store_listing', 'research', 'user_preference', 'approval', … Choose a stable, reusable name
`record`	object	✓	—	The STABLE payload to store (a JSON object). For a brief: the reusable facts only (appearance, colors, logo, audience, tone) — no creative-meta
`aliases`	array<string>		—	Extra keys this subject can also be found by (e.g. a normalized product name)
`content_hash`	string		—	The inputs fingerprint (the `content_hash` identity hint) so a later run can tell if the inputs changed (staleness)
`entity_key`	string		—	Canonical identity for this subject — use the `entity_key` identity hint from the first message (a hero-image hash or normalized URL), or author a stable slug. Required to make a record reusable by exact lookup. Omit only for episodic logs
`importance`	number		—	0..1 salience for ranking (default 0.5). Reserve >0.8 for facts that must surface (allergies-grade), <0.3 for minor notes
`mtype`	enum		—	semantic = entity record/brief (default); procedural = durable preference/rule (often pinned); episodic = a logged event (appends, no upsert). Values: `"semantic"` \| `"episodic"` \| `"procedural"`
`pinned`	boolean		—	If true, this record is injected into EVERY future job in scope (use for workspace preferences / brand kit). Default false
`scope`	enum		—	How widely it applies: entity (about one subject, default), workspace (brand kit / preferences shared by all skills). Values: `"entity"` \| `"skillpack"` \| `"workspace"`
`source_url`	string		—	Source URL the record was derived from, if any
`stale_at`	string		—	Optional ISO-8601 timestamp after which the record is considered stale and won't be reused (use for perishable facts like a current offer)
`summary`	string		—	STRONGLY RECOMMENDED: a 1–3 sentence natural-language gist of the record ('Ceramic coffee mug by Acme; matte black, gold logo; audience: office workers'). This is what future text/semantic search matches against — a record without it is only findable by exact key
`supersedes`	string		—	id (uuid) of an existing record this one REPLACES (it gets soft-deleted and linked to the new record). Use when correcting a record stored under a different kind/key — same-key re-puts already overwrite, no need for this
`tags`	array<string>		—	A few search labels, e.g. ['mug', 'acme', 'drinkware']
`title`	string		—	Human label for the memory browser (e.g. the product/game name)

Save (or update) a record so future jobs reuse it. A keyed record (semantic/procedural with an entity_key) is upserted — calling it again for the same kind+entity_key overwrites and bumps the version; episodic records append. Always include a 1–3 sentence summary (it's what future searches match) and a few tags; set importance only for records that must rank above (>0.8) or below (<0.3) their peers, and supersedes when this record replaces an outdated one under a different key.

Store only stable facts that outlive this job (an entity's appearance, brand colors/logo, audience, tone; a durable user preference; a recorded decision). Never store per-job creative choices (concept, shot plan, layout, headline, script, prompt text, chosen aspect ratios).

`memory_forget`

Soft-delete one memory record that is wrong, obsolete or harmful — it stops matching searches and injections immediately (an audit copy is kept briefly, then purged). Use when you confirmed a stored record is incorrect for its key and you are NOT writing a replacement (when you are, prefer memory_put with supersedes, or a same-key re-put).

Inputs

Field	Type	Required	Default	Notes
`id`	string	✓	—	The memory record id (uuid) to forget
`reason`	string		—	Short reason ('outdated listing', 'wrong product'). Kept for the audit trail

Soft-delete a memory record that turned out wrong and has no replacement (pass the id from a search hit and, ideally, a short reason). The record stops surfacing in search and injection but is kept for audit until the nightly maintenance pass purges it. When a replacement exists, prefer memory_put — same key overwrites; a different key with supersedes: <old id> retires the old record.

Lifecycle tool

`set_output` (auto-injected, conditional)

Record the final structured output for this job. Calling this ends the run. The argument must match the schema below.

When it's exposed

Only when the skill declares an output_schema in skill.yaml. The tool's input_schema is the skill's output_schema verbatim — so the agent gets a strongly-typed slot to fill before the run ends.

Semantics

Must be called exactly once.
Calling it ends the run; any subsequent tool calls in the same model response are ignored.
If the run ends without set_output being called, the job fails with agent finished without calling set_output.
Validation: the input is enforced against the declared output_schema before the job's final output is recorded.

Example skill.yaml fragment

yaml

entrypoint: SKILL.md
output_schema:
  type: object
  properties:
    caption: { type: string }
  required: [caption]

Inside the agent the model then calls set_output({"caption": "…"}) and the run terminates.

Index

Built-in tools

bash

todo_write

generate_image

generate_video

generate_audio

transcribe

web_search

image_search

web_fetch

send_email

web_screenshot

file_read

file_write

file_edit

download_url

drive_url

drive_pull

describe_subagent

run_subagent

memory_search

memory_get

memory_put

memory_forget

Lifecycle tool

set_output (auto-injected, conditional)

`bash`

`todo_write`

`generate_image`

`generate_video`

`generate_audio`

`transcribe`

`web_search`

`image_search`

`web_fetch`

`send_email`

`web_screenshot`

`file_read`

`file_write`

`file_edit`

`download_url`

`drive_url`

`drive_pull`

`describe_subagent`

`run_subagent`

`memory_search`

`memory_get`

`memory_put`

`memory_forget`

`set_output` (auto-injected, conditional)