Example project
A complete worked project — two skills (one deterministic, one agentic) + frontend snippet — you can copy as a starter.
A minimal but realistic project: an app uploads an image, a deterministic skill returns its dimensions and a downscaled thumbnail, and an agentic skill writes a one-line caption by looking at the photo directly via vision. Use this as the shape to copy when you start a new Puras project — push it as-is and it works.
Layout
image-tools/
requirements.txt # optional; worker pip-installs into a per-deployment venv
skills/
image-info/ # deterministic skill (Python entrypoint)
skill.yaml
main.py
caption/ # agentic skill (markdown entrypoint = system prompt)
skill.yaml
SKILL.md
No root manifest. Each skills/<name>/skill.yaml is auto-discovered; the directory name is the skill name.
requirements.txt
pillow>=10
(The worker reads requirements.txt from the bundle root and installs it into the deployment's venv before any job runs. Skip the file if your skills only need the stdlib + puras.)
skills/image-info/skill.yaml
description: Return image dimensions and write a 512px thumbnail back into the drive.
entrypoint: main.py:run
input_schema:
type: object
properties:
image:
oneOf:
- { type: string }
- { type: object }
required: [image]
output_schema:
type: object
properties:
width: { type: integer }
height: { type: integer }
format: { type: string }
thumb:
type: object
properties:
drive_path: { type: string }
required: [drive_path]
required: [width, height, format, thumb]
The .py:func entrypoint tells the worker this skill is deterministic — no LLM in the loop. The function runs in an isolated subprocess.
skills/image-info/main.py
"""Return image dimensions and write a 512px thumbnail back into the drive."""
from PIL import Image
from puras import load_path
def run(inputs: dict) -> dict:
src = load_path(inputs["image"], suffix=".img")
with Image.open(src) as im:
width, height, fmt = im.width, im.height, im.format
im.thumbnail((512, 512))
out_rel = f"thumbs/{src.stem}.jpg"
im.convert("RGB").save(f"drive/{out_rel}", "JPEG", quality=85)
return {
"width": width,
"height": height,
"format": fmt,
"thumb": {"drive_path": out_rel},
}
The skill works whether the caller sent {image: {drive_path}}, {image: {url}}, {image: {data: "data:..."}}, or a bare string. load_path does the routing; you write the file logic once. See inputs-and-drive for the full taxonomy.
skills/caption/skill.yaml
description: Write a short caption for a product photo.
entrypoint: SKILL.md
model: claude/sonnet-4-7
input_schema:
type: object
properties:
prompt: { type: string }
attachments:
type: array
items: { type: object }
minItems: 1
maxItems: 1
required: [attachments]
output_schema:
type: object
properties:
caption: { type: string }
required: [caption]
The .md entrypoint tells the worker this skill is agentic — the file's contents become the system prompt and the LLM tool-use loop runs. Because output_schema is set, the agent gets an auto-injected set_output tool and must call it once with { "caption": "..." } to finish.
skills/caption/SKILL.md
You write short, punchy captions for product photos.
You'll receive the photo as an attachment in the first user message (you can
see it directly — no tool calls needed to "open" it) and a `tone` hint in the
prompt text (e.g. "playful", "minimal", "luxury"). Default tone is "minimal".
Reply with exactly one caption, 8–14 words, in the requested tone. No
preamble, no quotes, no markdown.
That's the whole skill. The agent natively sees the image via the attachments mechanism — see agent-attachments for the wire format and the supported file types.
Pushing it
push(project_dir="/abs/path/to/image-tools", notes="starter", activate=true)
Calling it from an app
const API_BASE = "https://puras-api.fly.dev";
const KEY = "puras_live_AbCdEfGh.SecretSecretSecretSecretSecre32";
async function analyze(file /* a browser File */) {
// 1) Upload once.
const fd = new FormData(); fd.append("file", file);
const up = await fetch(`${API_BASE}/v1/drive/upload`, {
method: "POST",
headers: { Authorization: `Bearer ${KEY}` },
body: fd,
}).then(r => r.json());
// up: { drive_path, full_path, signed_url, bytes, content_type }
// 2) Fast deterministic skill — wait inline.
const info = await fetch(`${API_BASE}/v1/jobs?wait=true&timeout=20`, {
method: "POST",
headers: { Authorization: `Bearer ${KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({
skill: "image-info",
inputs: { image: { drive_path: up.drive_path } },
}),
}).then(r => r.json());
// info.result → { width, height, format, thumb: { drive_path } }
// 3) Kick off the agentic caption skill — attach the photo natively.
const cap = await fetch(`${API_BASE}/v1/jobs`, {
method: "POST",
headers: { Authorization: `Bearer ${KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({
skill: "caption",
inputs: {
prompt: "Tone: playful. Write the caption.",
attachments: [{ drive_path: up.drive_path }],
},
}),
}).then(r => r.json());
// Poll GET /v1/jobs/{cap.id} or stream GET /v1/jobs/{cap.id}/stream
return { info: info.result, captionJobId: cap.id };
}
Same submit body shape for both skills — no type field, no distinction at the call site. The worker reads each skill's entrypoint and dispatches accordingly.
To display the thumbnail in the UI, mint a signed URL for it:
GET /v1/drive/sign?path=<thumb-drive-path>&ttl=3600
What this project demonstrates
- Two skill styles, one API — a deterministic Python skill for fast structured work, an agentic skill for multi-step LLM work, both submitted with the same
POST /v1/jobsshape. - One source of truth for files — the upload happens once; both skills reference it by
drive_path. - Polymorphic file inputs in a deterministic skill — the same code works for uploads, URLs, and inline base64 (inputs-and-drive).
- Native vision in an agentic skill — the agent looks at the image directly via
inputs.attachments; nobash cat, nomediatool round-trip, no manual URL signing (agent-attachments).
Where to take it next
- Add a
tools:list tocaption/skill.yamlif you want the agent to call your own Python helpers mid-run (e.g. aquery_dimensionstool that wraps the same Pillow logic). - Add more skills under
skills/for follow-up ops (crop, OCR, classify) — agentic skills can chain them via tool-use. - Set project secrets (
set_secret) for any third-party keys your skill code needs; they're injected as env vars at run time.