puras

Concepts

Projects, deployments, skills, jobs, secrets, drive, billing.

Project

The unit of tenancy. Holds API keys, deployments, secrets, drive files, credit balance, and jobs. A user can own multiple projects; an API key is always project-scoped.

API key

Format: puras_live_<prefix8>.<secret32>. The dot separator is part of the key — do not strip or replace it. The prefix is stored in plaintext (used for fast lookup); only sha256(secret) is stored. Keys are shown once at creation. Pass as Authorization: Bearer <key> on every API call.

Deployment (project-as-unit)

A deployment is a zip of the whole project, not a per-skill push. The bundle is auto-discovered — there is no root manifest file. The worker scans skills/*/skill.yaml; each immediate child directory of skills/ that contains a skill.yaml is registered as a skill, and the directory name is the skill name.

Activating a new deployment is a rolling switch: new jobs use the active deployment; jobs already running keep their original code until they terminate.

Bundle layout:

my-project/
  requirements.txt          # optional — extra pip deps for the deployment venv
  skills/
    ad-creative/
      skill.yaml
      SKILL.md              # system prompt (agentic entrypoint)
      tools/
        render_video.py     # per-skill tool, referenced from skill.yaml
    image-info/
      skill.yaml
      main.py               # deterministic entrypoint

Skill

A directory under skills/ containing a skill.yaml. The yaml's entrypoint decides what kind of skill it is:

  • entrypoint: SKILL.md (or any .md file) → agentic. The file is read as the system prompt, the worker starts an LLM tool-use loop, exposes the platform tools (bash, media, web_search, …) plus any user tools you declared, and iterates until the agent stops.
  • entrypoint: main.py:rundeterministic. The worker imports main from the skill directory and calls run(inputs: dict) in an isolated subprocess. No LLM in the loop unless your code calls one.

skill.yaml shape:

yaml
description: One-line summary shown in dashboards and playgrounds.
entrypoint: SKILL.md                       # or "main.py:run"
model: claude/opus-4-7                     # optional, agentic only — see docs/models
disable_bash: false                        # optional, agentic only
input_schema: { ... JSON Schema ... }      # validated before run
output_schema: { ... JSON Schema ... }     # validated after run (for deterministic skills)
                                           # or via the auto-injected `set_output` tool (agentic)
tools:                                     # optional, agentic only
  - name: render_video
    description: Render a video from a storyboard.
    entrypoint: tools/render_video.py:run  # path relative to the skill dir
    input_schema: { ... }
    output_schema: { ... }

Defaults: bash on for agentic skills, the platform default model is used if model is unset. See models for the available slugs.

Tools inside an agentic skill

The tools: list on an agentic skill declares Python callables the model can invoke via tool-use. Each tool runs in the same subprocess runner the deterministic skills use, with the tool's input_schema enforced before dispatch and its output_schema enforced before the result goes back to the model. Tools are per-skill and namespaced by the skill — they're not a separate top-level concept.

Job

Submitted via POST /v1/jobs (or the submit_job MCP tool):

json
{ "skill": "<skill-name>", "inputs": { } }

The worker reads the named skill from the active deployment and dispatches to the agent loop or the deterministic runner based on the skill's entrypoint. There is no type field to pick — the skill's manifest is the source of truth.

Lifecycle: queued → running → succeeded | failed | cancelled.

Three call modes

A single endpoint covers all three patterns — pick the one that matches the caller's appetite for latency vs simplicity:

ModeRequestResponseUse when
asyncPOST /v1/jobsJobOut immediately, status="queued"Fire and forget; caller polls GET /v1/jobs/{id}
syncPOST /v1/jobs?wait=true&timeout=N (1–60s)JobOut once terminal — or current row at timeoutShort jobs (deterministic skills, fast agentic skills)
streamPOST /v1/jobs?stream=truetext/event-stream (SSE) of job_events liveLong agentic skills where the caller wants tool calls / model responses in real time

wait and stream are mutually exclusive. stream returns SSE frames as JSON-encoded {id, type, payload} blocks, terminated by an event: end frame with the final status. GET /v1/jobs/{id}/stream attaches to an in-flight job using the same protocol — useful for reconnects.

Delivery: the worker fires pg_notify('puras_job_events:{job_id}', ...) on every event row, so SSE latency is roughly one network roundtrip. A 15s heartbeat (: ping) keeps proxies and idle clients honest and acts as a safety net for missed notifies.

On wait timeout, the row is returned in its current non-terminal state — keep polling or call tail_job.

The worker claims jobs with SELECT ... FOR UPDATE SKIP LOCKED FROM jobs WHERE status='queued' AND projects.credit_balance_micros > 0, plus a pg_notify fast path. A job with no credit will sit in queued until the balance becomes positive.

Secrets

Project-scoped key/value pairs. Names must match ^[A-Z_][A-Z0-9_]*$ (env-var style). Values are encrypted at rest and never returned by the API; only names are listable. Injected as environment variables into both the agent's bash tool and the skill subprocess at run time.

Drive

Per-project private storage (Supabase bucket under the hood). All projects share one drive bucket; isolation is by path prefix (<project_id>/...). Skills can write outputs (images, videos, audio) into the drive and refer to them by path.

Three HTTP surfaces against the drive — all auth via JWT or API key, all project-scoped:

  • POST /v1/drive/upload (multipart) — apps push files in, get back a relative drive_path.
  • GET /v1/drive/list?prefix=... — list direct children (folders + files).
  • GET /v1/drive/sign?path=...&ttl=... — mint a signed URL to display or download. Also exposed as the drive_sign MCP tool.

Inside a running job the project drive is symlinked at ./drive/ — read/write it as plain files. See inputs-and-drive for the upload + read flow end-to-end.

Billing

Currency: MICROS. 1 USD = 1_000_000 micros. All balances and per-call costs are micros — there are no floats in the ledger.

Upstream cost (LLM tokens and media generation) is multiplied by the platform margin PURAS_MARGIN_PCT (default 20%) before being debited from projects.credit_balance_micros. The model is a marketplace: you pay upstream + margin, atomically, when each call lands.

Balance is checked on claim, not per-call mid-job. A long job can run the balance negative on its last call; subsequent jobs will not claim until you top up. Admin top-up: scripts/grant_credits.sh <project_id> <usd_amount>.

Per-job cost

Every job carries a denormalized rollup field cost_micros on JobOut. It accumulates every charge that landed against the job — LLM steps (provider/model from the active deployment), media.run calls, and web tool calls — so the dashboard and the API caller can show "this job cost $X" without joining usage_events.

For the breakdown — which model, how many calls, how many tokens, how much each line item cost — call GET /v1/jobs/{job_id}/usage. Each row is one usage_events entry with provider, model, input_tokens, output_tokens, and billed_micros. The sum of billed_micros over those rows equals the job's cost_micros.

See sdk-media for the media generation surface and how its pricing flows through.