Media SDK

There is one knob: media.run(model, inputs). model is one of our registered slugs (e.g. openai/gpt-image-2, kuaishou/kling-v3-i2v). Anything the model accepts as arguments, you pass as inputs — we don't validate input shape, the model does. Cost is debited from your project's credit balance at the registered rate.

Most families ship as multiple slugs — one per model variant (text-to-X, image-to-X, reference-to-X, fast vs. standard). Pick the slug that matches what you're doing; the catalog at GET /v1/pricing lists them all. A handful of slugs also have input-conditional pricing (e.g. audio on/off, with/without a video reference) — the registered rate_table is the source of truth and we bill exactly the row your inputs hit.

The full catalog of slugs and their rates is the pricing page or GET /v1/pricing. Unknown slugs are rejected (400).

Surface

python

from puras import media, secret

media.run(
    model: str,
    inputs: dict | None = None,
    *,
    output_path: str | None = None,
    output_url_path: str | None = None,
    kind: str = "auto",       # "image" | "video" | "audio" | "auto"
    **kwargs,                 # merged into inputs (kwargs win)
) -> dict

secret(name: str) -> str      # read a project secret

Returns:

python

{
  "model": "kuaishou/kling-v3-i2v",
  "kind": "video",                              # resolved from "auto" or echoed
  "drive_path": "media/12b5e4d5...mp4",         # path inside your drive
  "output_url": "https://...supabase.../mp4",   # signed URL, TTL ~1h
  "request_id": "...",
  "billed_micros": 672000,
  "billed_usd": 0.672,
  "meta": {"metrics": {"inference_time": 12.4}, ...},
}

Patterns

Text-to-image

python

img = media.run(
    "openai/gpt-image-2",
    {"prompt": "a vintage red bicycle", "size": "1024x1024", "quality": "high"},
)

# Edit / composite reference images
edited = media.run(
    "bytedance/seedream-v4-edit",
    {"image_url": img["output_url"], "prompt": "give it neon trim"},
)

Image-to-video

python

vid = media.run(
    "bytedance/seedance-2-i2v",
    {
        "image_url": "https://...",
        "prompt": "make it spin slowly",
        "duration": 8,
    },
    output_path="renders/spin.mp4",
)

Reference-to-video (Seedance r2v)

python

clip = media.run(
    "bytedance/seedance-2-r2v",
    {
        "prompt": "match the style of the reference clip",
        "image_urls": ["https://..."],
        "video_url": "https://...",   # triggers the with-reference rate
        "duration": 6,
    },
)

Audio + voice control (Kling v3, Veo 3)

python

# Audio off — cheapest tier
clip = media.run("kuaishou/kling-v3-t2v", prompt="rainy alley", duration=5)

# Audio on — billed at the audio-on per-second rate
clip = media.run(
    "google/veo-3-t2v",
    prompt="thunder rolling over hills",
    duration=4,
    generate_audio=True,
)

Fast tiers

Where a model has a fast variant, it's a separate slug (-fast-) at a lower per-second rate:

python

quick = media.run("bytedance/seedance-2-fast-t2v", prompt="...", duration=5)
quick = media.run("google/veo-3-fast-i2v", image_url="...", duration=4)

A model with an unusual response shape

If we can't find the output URL automatically, point at it with output_url_path (jq-style):

python

weird = media.run(
    "kuaishou/kling-v3-image",
    {...},
    output_url_path="outputs[0].asset.url",
)

Inside a deterministic skill (or a per-skill tool)

python

from puras import media

def run(inputs: dict) -> dict:
    img = media.run(
        "openai/gpt-image-2",
        {"prompt": inputs["prompt"]},
    )
    return {"drive_path": img["drive_path"], "billed_usd": img["billed_usd"]}

Same import works from any Python callable the worker dispatches — a deterministic skill's entrypoint, or one of an agentic skill's declared tools:.

As an agent tool (built-in)

Agentic skills automatically get a media tool exposed to the model (same surface as media.run()). The agent picks a model slug and inputs at runtime — you don't declare it in skill.yaml. See concepts for skill setup; the tools: list on a skill is for your own Python helpers, not for the built-in media tool.

How billing resolves

Every successful call is priced from the registry — there is no live lookup and no fallback. Each slug carries one of:

per-call (most image models)
per-second of output (video / audio)
per-megapixel (some image models)
input-conditional — a rate table indexed by inputs (audio on/off, with/without video reference, quality × size). The bill is computed from the inputs you actually sent.

The exact amount lands in billed_micros and is also written to a usage_events row you can audit.

Conventions

Always prefer media.run over hitting /v1/media/generate with raw httpx. The SDK injects the worker's service token and job context for you — a raw call won't bill correctly.
Don't rely on the file extension matching the kind. Some models return .webp for kind="image"; the SDK detects extension from the URL and saves accordingly. drive_path is authoritative.
Don't open the returned output_url from server code to "verify" the file — it's a signed URL meant for the client. Use drive_path server-side; mint a fresh signed URL with the drive_sign MCP tool when you need to share.
Don't retry on a failed call without inspecting error. Most model errors are deterministic (bad params, NSFW filter, model down) — a blind retry just burns more credit. Fix the inputs first.