Media model reference
Per-slug input schema and pricing for every model usable via `media.run()`. Auto-generated from the catalog.
Every slug here is callable via media.run(slug, inputs). The input fields map 1:1 to the model itself — we don't validate shape, the model does. Cost is debited from your project balance at the listed rate, computed from the inputs you send.
For the SDK signature and patterns, see sdk-media. For the broader catalog and policy, see models.
Index
| Slug | Kind | Price |
|---|---|---|
openai/gpt-image-2 | image | $0.253 / call |
openai/gpt-image-2-edit | image | $0.263 / call |
bytedance/seedream-v4 | image | $0.036 / call |
bytedance/seedream-v4-edit | image | $0.036 / call |
google/imagen-4 | image | $0.060 / call |
kuaishou/kling-v3-image | image | $0.034 / call |
bytedance/seedance-2-t2v | video | $0.364 / s |
bytedance/seedance-2-i2v | video | $0.363 / s |
bytedance/seedance-2-r2v | video | $0.363 / s |
bytedance/seedance-2-fast-t2v | video | $0.290 / s |
bytedance/seedance-2-fast-i2v | video | $0.290 / s |
bytedance/seedance-2-fast-r2v | video | $0.290 / s |
kuaishou/kling-v3-t2v | video | $0.134 / s |
kuaishou/kling-v3-i2v | video | $0.134 / s |
google/veo-3-t2v | video | $0.600 / s |
google/veo-3-i2v | video | $0.240 / s |
google/veo-3-fast-t2v | video | $0.300 / s |
google/veo-3-fast-i2v | video | $0.300 / s |
Images
openai/gpt-image-2
GPT Image 2 · text → image
Quality × size matrix. Set quality ∈ low|medium|high and size (e.g. 1024x1024).
Pricing
| Variant | Price |
|---|---|
| low · 1024x768 | $0.0060/image |
| low · 1024x1024 | $0.0072/image |
| low · 1024x1536 | $0.0060/image |
| low · 1920x1080 | $0.0060/image |
| low · 2560x1440 | $0.0084/image |
| low · 3840x2160 | $0.014/image |
| medium · 1024x768 | $0.044/image |
| medium · 1024x1024 | $0.064/image |
| medium · 1024x1536 | $0.050/image |
| medium · 1920x1080 | $0.048/image |
| medium · 2560x1440 | $0.067/image |
| medium · 3840x2160 | $0.121/image |
| high · 1024x768 | $0.174/image |
| high · 1024x1024 | $0.253/image |
| high · 1024x1536 | $0.198/image |
| high · 1920x1080 | $0.190/image |
| high · 2560x1440 | $0.266/image |
| high · 3840x2160 | $0.481/image |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | The prompt for image generation. (max 32,000 chars, min 2 chars) |
image_size | enum | "landscape_4_3" | The size of the generated image. Supports preset names, explicit {width, height}, or 'auto' to let the model pick the best size. Concrete sizes must have both dimensions as multiples of 16, max edge 3840px, aspect ratio <= 3:1, total pixels between 655,360 and 8,294,400. Values: "square_hd" | "square" | "portrait_4_3" | "portrait_16_9" | "landscape_4_3" | "landscape_16_9" | "auto" | |
num_images | integer | 1 | Number of images to generate. (≥ 1, ≤ 4) | |
output_format | enum | "png" | Output format for the images. Values: "jpeg" | "png" | "webp" | |
quality | enum | "high" | Quality for the generated image. Use 'auto' to let the model pick the best quality for the prompt. Values: "auto" | "low" | "medium" | "high" |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"openai/gpt-image-2",
prompt="...",
image_size="square_hd",
quality="high",
)
openai/gpt-image-2-edit
GPT Image 2 (edit) · image → image (edit)
Image-to-image edit. Quality × size matrix; provide image_url(s).
Pricing
| Variant | Price |
|---|---|
| low · 1024x768 | $0.013/image |
| low · 1024x1024 | $0.018/image |
| low · 1024x1536 | $0.022/image |
| low · 1920x1080 | $0.020/image |
| low · 2560x1440 | $0.023/image |
| low · 3840x2160 | $0.029/image |
| medium · 1024x768 | $0.052/image |
| medium · 1024x1024 | $0.073/image |
| medium · 1024x1536 | $0.065/image |
| medium · 1920x1080 | $0.064/image |
| medium · 2560x1440 | $0.082/image |
| medium · 3840x2160 | $0.136/image |
| high · 1024x768 | $0.181/image |
| high · 1024x1024 | $0.263/image |
| high · 1024x1536 | $0.214/image |
| high · 1920x1080 | $0.190/image |
| high · 2560x1440 | $0.281/image |
| high · 3840x2160 | $0.496/image |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
image_urls | array<string> | ✓ | — | The URLs of the images to use as a reference for the generation |
prompt | string | ✓ | — | The prompt for image generation. (max 32,000 chars, min 2 chars) |
image_size | enum | "auto" | The size of the generated image. Use 'auto' to infer from input images. Values: "square_hd" | "square" | "portrait_4_3" | "portrait_16_9" | "landscape_4_3" | "landscape_16_9" | "auto" | |
mask_url | string | — | The URL of the mask image to use for the generation. This indicates what part of the image to edit | |
num_images | integer | 1 | Number of images to generate. (≥ 1, ≤ 4) | |
output_format | enum | "png" | Output format for the images. Values: "jpeg" | "png" | "webp" | |
quality | enum | "high" | Quality for the generated image. Use 'auto' to let the model pick the best quality for the prompt. Values: "auto" | "low" | "medium" | "high" |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"openai/gpt-image-2-edit",
prompt="...",
image_urls=["https://..."],
image_size="square_hd",
quality="high",
)
bytedance/seedream-v4
Seedream v4 · text → image
Pricing
$0.036 per call
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | The text prompt used to generate the image |
enable_safety_checker | boolean | true | If set to true, the safety checker will be enabled | |
enhance_prompt_mode | enum | "standard" | The mode to use for enhancing prompt enhancement. Standard mode provides higher quality results but takes longer to generate. Fast mode provides average quality results but takes less time to generate. Values: "standard" | "fast" | |
image_size | enum | {height: 2048, width: 2048} | The size of the generated image. Total pixels must be between 960x960 and 4096x4096. Values: "square_hd" | "square" | "portrait_4_3" | "portrait_16_9" | "landscape_4_3" | "landscape_16_9" | "auto" | "auto_2K" | "auto_4K" | |
max_images | integer | 1 | If set to a number greater than one, enables multi-image generation. The model will potentially return up to max_images images every generation, and in total, num_images generations will be carried out. In total, the number of images generated will be between num_images and max_images*num_images. (≥ 1, ≤ 6) | |
num_images | integer | 1 | Number of separate model generations to be run with the prompt. (≥ 1, ≤ 6) | |
seed | integer | — | Random seed to control the stochasticity of image generation |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned. Additional metadata available under meta (seed).
Example
result = media.run(
"bytedance/seedream-v4",
prompt="...",
image_size="square_hd",
)
bytedance/seedream-v4-edit
Seedream v4 (edit) · image → image (edit)
Edit/composite reference images. Provide image_url (or list) plus a prompt.
Pricing
$0.036 per call
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
image_urls | array<string> | ✓ | — | List of URLs of input images for editing. Presently, up to 10 image inputs are allowed. If over 10 images are sent, only the last 10 will be used |
prompt | string | ✓ | — | The text prompt used to edit the image |
enable_safety_checker | boolean | true | If set to true, the safety checker will be enabled | |
enhance_prompt_mode | enum | "standard" | The mode to use for enhancing prompt enhancement. Standard mode provides higher quality results but takes longer to generate. Fast mode provides average quality results but takes less time to generate. Values: "standard" | "fast" | |
image_size | enum | {height: 2048, width: 2048} | The size of the generated image. The minimum total image area is 921600 pixels. Failing this, the image size will be adjusted to by scaling it up, while maintaining the aspect ratio. Values: "square_hd" | "square" | "portrait_4_3" | "portrait_16_9" | "landscape_4_3" | "landscape_16_9" | "auto" | "auto_2K" | "auto_4K" | |
max_images | integer | 1 | If set to a number greater than one, enables multi-image generation. The model will potentially return up to max_images images every generation, and in total, num_images generations will be carried out. In total, the number of images generated will be between num_images and max_images*num_images. The total number of images (image inputs + image outputs) must not exceed 15. (≥ 1, ≤ 6) | |
num_images | integer | 1 | Number of separate model generations to be run with the prompt. (≥ 1, ≤ 6) | |
seed | integer | — | Random seed to control the stochasticity of image generation |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned. Additional metadata available under meta (seed).
Example
result = media.run(
"bytedance/seedream-v4-edit",
prompt="...",
image_urls=["https://..."],
image_size="square_hd",
)
google/imagen-4
Imagen 4 · text → image
Pricing
$0.060 per call
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | The text prompt to generate an image from. (max 5,000 chars, min 3 chars) |
aspect_ratio | enum | "1:1" | The aspect ratio of the generated image. Values: "1:1" | "16:9" | "9:16" | "4:3" | "3:4" | |
num_images | integer | 1 | The number of images to generate. (≥ 1, ≤ 4) | |
output_format | enum | "png" | The format of the generated image. Values: "jpeg" | "png" | "webp" | |
resolution | enum | "1K" | The resolution of the generated image. Values: "1K" | "2K" | |
safety_tolerance | enum | "4" | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: "1" | "2" | "3" | "4" | "5" | "6" | |
seed | integer | — | The seed for the random number generator |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"google/imagen-4",
prompt="...",
)
kuaishou/kling-v3-image
Kling v3 (image) · text → image
Pricing
$0.034 per call
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | Text prompt for image generation. Max 2500 characters. (max 2,500 chars) |
aspect_ratio | enum | "16:9" | Aspect ratio of generated images. Values: "16:9" | "9:16" | "1:1" | "4:3" | "3:4" | "3:2" | "2:3" | "21:9" | |
elements | array<elementinput> | — | Optional: Elements (characters/objects) to include in the image for face control. Each element can have a frontal image and optionally reference images | |
negative_prompt | string | — | Negative text prompt. It is recommended to supplement negative prompt information through negative sentences directly within positive prompts | |
num_images | integer | 1 | Number of images to generate (1-9). (≥ 1, ≤ 9) | |
output_format | enum | "png" | The format of the generated image. Values: "jpeg" | "png" | "webp" | |
resolution | enum | "1K" | Image generation resolution. 1K: standard, 2K: high-res. Values: "1K" | "2K" |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"kuaishou/kling-v3-image",
prompt="...",
)
Videos
bytedance/seedance-2-t2v
Seedance 2.0 — text→video
720p–1080p text-to-video. Audio included.
Pricing
$0.364 per second of output
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | The text prompt used to generate the video |
aspect_ratio | enum | "auto" | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: "auto" | "21:9" | "16:9" | "4:3" | "1:1" | "3:4" | "9:16" | |
duration | enum | "auto" | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: "auto" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" | |
generate_audio | boolean | true | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not | |
resolution | enum | "720p" | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. Values: "480p" | "720p" | "1080p" | |
seed | integer | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned. Additional metadata available under meta (seed).
Example
result = media.run(
"bytedance/seedance-2-t2v",
prompt="...",
duration="auto",
)
bytedance/seedance-2-i2v
Seedance 2.0 — image→video
Pricing
$0.363 per second of output
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
image_url | string | ✓ | — | The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB |
prompt | string | ✓ | — | The text prompt describing the desired motion and action for the video |
aspect_ratio | enum | "auto" | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. Values: "auto" | "21:9" | "16:9" | "4:3" | "1:1" | "3:4" | "9:16" | |
duration | enum | "auto" | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: "auto" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" | |
end_image_url | string | — | The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB | |
generate_audio | boolean | true | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not | |
resolution | enum | "720p" | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. Values: "480p" | "720p" | "1080p" | |
seed | integer | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned. Additional metadata available under meta (seed).
Example
result = media.run(
"bytedance/seedance-2-i2v",
prompt="...",
image_url="https://...",
duration="auto",
)
bytedance/seedance-2-r2v
Seedance 2.0 — reference→video
Up to 9 image / 3 video / 3 audio references. Per-second drops 40% when a video reference is passed.
Pricing
| Variant | Price |
|---|---|
| without video reference | $0.363/s |
| with video reference | $0.218/s |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | The text prompt used to generate the video |
aspect_ratio | enum | "auto" | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: "auto" | "21:9" | "16:9" | "4:3" | "1:1" | "3:4" | "9:16" | |
audio_urls | array<string> | — | Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required | |
duration | enum | "auto" | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: "auto" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" | |
generate_audio | boolean | true | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not | |
image_urls | array<string> | — | Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12 | |
resolution | enum | "720p" | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. Values: "480p" | "720p" | "1080p" | |
seed | integer | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed | |
video_urls | array<string> | — | Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned. Additional metadata available under meta (seed).
Example
result = media.run(
"bytedance/seedance-2-r2v",
prompt="...",
image_urls=["https://..."],
duration="auto",
)
bytedance/seedance-2-fast-t2v
Seedance 2.0 Fast — text→video
Pricing
$0.290 per second of output
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | The text prompt used to generate the video |
aspect_ratio | enum | "auto" | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: "auto" | "21:9" | "16:9" | "4:3" | "1:1" | "3:4" | "9:16" | |
duration | enum | "auto" | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: "auto" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" | |
generate_audio | boolean | true | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not | |
resolution | enum | "720p" | Video resolution - 480p for faster generation, 720p for balance. Values: "480p" | "720p" | |
seed | integer | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned. Additional metadata available under meta (seed).
Example
result = media.run(
"bytedance/seedance-2-fast-t2v",
prompt="...",
duration="auto",
)
bytedance/seedance-2-fast-i2v
Seedance 2.0 Fast — image→video
Pricing
$0.290 per second of output
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
image_url | string | ✓ | — | The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB |
prompt | string | ✓ | — | The text prompt describing the desired motion and action for the video |
aspect_ratio | enum | "auto" | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. Values: "auto" | "21:9" | "16:9" | "4:3" | "1:1" | "3:4" | "9:16" | |
duration | enum | "auto" | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: "auto" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" | |
end_image_url | string | — | The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB | |
generate_audio | boolean | true | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not | |
resolution | enum | "720p" | Video resolution - 480p for faster generation, 720p for balance. Values: "480p" | "720p" | |
seed | integer | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned. Additional metadata available under meta (seed).
Example
result = media.run(
"bytedance/seedance-2-fast-i2v",
prompt="...",
image_url="https://...",
duration="auto",
)
bytedance/seedance-2-fast-r2v
Seedance 2.0 Fast — reference→video
Pricing
| Variant | Price |
|---|---|
| without video reference | $0.290/s |
| with video reference | $0.174/s |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | The text prompt used to generate the video |
aspect_ratio | enum | "auto" | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: "auto" | "21:9" | "16:9" | "4:3" | "1:1" | "3:4" | "9:16" | |
audio_urls | array<string> | — | Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required | |
duration | enum | "auto" | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: "auto" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" | |
generate_audio | boolean | true | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not | |
image_urls | array<string> | — | Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12 | |
resolution | enum | "720p" | Video resolution - 480p for faster generation, 720p for balance. Values: "480p" | "720p" | |
seed | integer | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed | |
video_urls | array<string> | — | Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned. Additional metadata available under meta (seed).
Example
result = media.run(
"bytedance/seedance-2-fast-r2v",
prompt="...",
image_urls=["https://..."],
duration="auto",
)
kuaishou/kling-v3-t2v
Kling v3 Pro — text→video
Set generate_audio: true to enable audio, voice_control: true for voice.
Pricing
| Variant | Price |
|---|---|
| audio off | $0.134/s |
| audio on | $0.202/s |
| audio + voice control | $0.235/s |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
aspect_ratio | enum | "16:9" | The aspect ratio of the generated video frame. Values: "16:9" | "9:16" | "1:1" | |
cfg_scale | number | 0.5 | The CFG (Classifier Free Guidance) scale is a measure of how close you want |
the model to stick to your prompt. (≥ 0, ≤ 1) |
| duration | enum | | "5" | The duration of the generated video in seconds. Values: "3" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" |
| generate_audio | boolean | | true | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase |
| multi_prompt | array<klingv3multipromptelement> | | — | List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations |
| negative_prompt | string | | "blur, distort, and low quality" | (max 2,500 chars) |
| prompt | string | | — | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both |
| shot_type | enum | | "customize" | The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. Values: "customize" | "intelligent" |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"kuaishou/kling-v3-t2v",
prompt="...",
duration="5",
)
kuaishou/kling-v3-i2v
Kling v3 Pro — image→video
Pricing
| Variant | Price |
|---|---|
| audio off | $0.134/s |
| audio on | $0.202/s |
| audio + voice control | $0.235/s |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
start_image_url | string | ✓ | — | URL of the image to be used for the video |
cfg_scale | number | 0.5 | The CFG (Classifier Free Guidance) scale is a measure of how close you want |
the model to stick to your prompt. (≥ 0, ≤ 1) |
| duration | enum | | "5" | The duration of the generated video in seconds. Values: "3" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" |
| elements | array<klingv3comboelementinput> | | — | Elements (characters/objects) to include in the video. Each example can either be an image set (frontal + reference images) or a video. Reference in prompt as @Element1, @Element2, etc |
| end_image_url | string | | — | URL of the image to be used for the end of the video |
| generate_audio | boolean | | true | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase |
| multi_prompt | array<klingv3multipromptelement> | | — | List of prompts for multi-shot video generation. If provided, divides the video into multiple shots |
| negative_prompt | string | | "blur, distort, and low quality" | (max 2,500 chars) |
| prompt | string | | — | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both |
| shot_type | enum | | "customize" | The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. Values: "customize" | "intelligent" |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"kuaishou/kling-v3-i2v",
prompt="...",
duration="5",
)
google/veo-3-t2v
Veo 3 — text→video
Pricing
| Variant | Price |
|---|---|
| audio off | $0.600/s |
| audio on | $0.900/s |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | The text prompt describing the video you want to generate. (max 20,000 chars) |
aspect_ratio | enum | "16:9" | The aspect ratio of the generated video. Values: "16:9" | "9:16" | |
auto_fix | boolean | true | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them | |
duration | enum | "8s" | The duration of the generated video. Values: "4s" | "6s" | "8s" | |
generate_audio | boolean | true | Whether to generate audio for the video | |
negative_prompt | string | — | A negative prompt to guide the video generation | |
resolution | enum | "720p" | The resolution of the generated video. Values: "720p" | "1080p" | |
safety_tolerance | enum | "4" | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: "1" | "2" | "3" | "4" | "5" | "6" | |
seed | integer | — | The seed for the random number generator |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"google/veo-3-t2v",
prompt="...",
duration="8s",
)
google/veo-3-i2v
Veo 3 — image→video
Pricing
| Variant | Price |
|---|---|
| audio off | $0.240/s |
| audio on | $0.480/s |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
image_url | string | ✓ | — | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit |
prompt | string | ✓ | — | The text prompt describing how the image should be animated. (max 20,000 chars) |
aspect_ratio | enum | "auto" | The aspect ratio of the generated video. Values: "auto" | "16:9" | "9:16" | |
auto_fix | boolean | false | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them | |
duration | enum | "8s" | The duration of the generated video. Values: "4s" | "6s" | "8s" | |
generate_audio | boolean | true | Whether to generate audio for the video | |
negative_prompt | string | — | A negative prompt to guide the video generation | |
resolution | enum | "720p" | The resolution of the generated video. Values: "720p" | "1080p" | |
safety_tolerance | enum | "4" | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: "1" | "2" | "3" | "4" | "5" | "6" | |
seed | integer | — | The seed for the random number generator |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"google/veo-3-i2v",
prompt="...",
image_url="https://...",
duration="8s",
)
google/veo-3-fast-t2v
Veo 3 Fast — text→video
Pricing
| Variant | Price |
|---|---|
| audio off | $0.300/s |
| audio on | $0.480/s |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | ✓ | — | The text prompt describing the video you want to generate. (max 20,000 chars) |
aspect_ratio | enum | "16:9" | The aspect ratio of the generated video. Values: "16:9" | "9:16" | |
auto_fix | boolean | true | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them | |
duration | enum | "8s" | The duration of the generated video. Values: "4s" | "6s" | "8s" | |
generate_audio | boolean | true | Whether to generate audio for the video | |
negative_prompt | string | — | A negative prompt to guide the video generation | |
resolution | enum | "720p" | The resolution of the generated video. Values: "720p" | "1080p" | |
safety_tolerance | enum | "4" | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: "1" | "2" | "3" | "4" | "5" | "6" | |
seed | integer | — | The seed for the random number generator |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"google/veo-3-fast-t2v",
prompt="...",
duration="8s",
)
google/veo-3-fast-i2v
Veo 3 Fast — image→video
Pricing
| Variant | Price |
|---|---|
| audio off | $0.300/s |
| audio on | $0.480/s |
Inputs
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
image_url | string | ✓ | — | URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit |
prompt | string | ✓ | — | The text prompt describing how the image should be animated. (max 20,000 chars) |
aspect_ratio | enum | "auto" | The aspect ratio of the generated video. Values: "auto" | "16:9" | "9:16" | |
auto_fix | boolean | false | Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them | |
duration | enum | "8s" | The duration of the generated video. Values: "4s" | "6s" | "8s" | |
generate_audio | boolean | true | Whether to generate audio for the video | |
negative_prompt | string | — | A negative prompt to guide the video generation | |
resolution | enum | "720p" | The resolution of the generated video. Values: "720p" | "1080p" | |
safety_tolerance | enum | "4" | The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: "1" | "2" | "3" | "4" | "5" | "6" | |
seed | integer | — | The seed for the random number generator |
Output
Saved to your project drive at drive_path; a signed output_url (TTL ~1h) is returned.
Example
result = media.run(
"google/veo-3-fast-i2v",
prompt="...",
image_url="https://...",
duration="8s",
)