Media model reference

These are the underlying models the generate_image / generate_video / generate_audio verbs resolve to. You don't call them directly — you pick a model family with a verb and the platform adapts your inputs to the native shape below. This page documents that native shape for reference.

For the verbs, see agent-tools-reference and sdk-runtime-reference. For prices, see the pricing page.

Index

Slug	Kind
`openai/gpt-image-2`	image
`openai/gpt-image-2-edit`	image
`bytedance/seedream-v4`	image
`bytedance/seedream-v4-edit`	image
`google/imagen-4`	image
`google/nano-banana-pro`	image
`google/nano-banana-pro-edit`	image
`kuaishou/kling-v3-image`	image
`kuaishou/kling-v3-image-edit`	image
`bytedance/seedance-2-t2v`	video
`bytedance/seedance-2-i2v`	video
`bytedance/seedance-2-r2v`	video
`bytedance/seedance-2-fast-t2v`	video
`bytedance/seedance-2-fast-i2v`	video
`bytedance/seedance-2-fast-r2v`	video
`kuaishou/kling-v3-t2v`	video
`kuaishou/kling-v3-i2v`	video
`kuaishou/kling-o3-r2v`	video
`kuaishou/kling-avatar-v2`	video
`google/veo-3-t2v`	video
`google/veo-3-i2v`	video
`google/veo-3-fast-t2v`	video
`google/veo-3-fast-i2v`	video
`google/veo-3.1-r2v`	video
`elevenlabs/scribe-v2`	audio
`elevenlabs/tts-v3`	audio
`elevenlabs/tts-multilingual-v2`	audio

Images

`openai/gpt-image-2`

GPT Image 2 · text → image

Quality × size matrix. Set quality ∈ low|medium|high and size (e.g. 1024x1024).

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The prompt for image generation. (max 32,000 chars, min 2 chars)
`image_size`	enum		`"landscape_4_3"`	The size of the generated image. Supports preset names, explicit {width, height}, or 'auto' to let the model pick the best size. Concrete sizes must have both dimensions as multiples of 16, max edge 3840px, aspect ratio <= 3:1, total pixels between 655,360 and 8,294,400. Values: `"square_hd"` \| `"square"` \| `"portrait_4_3"` \| `"portrait_16_9"` \| `"landscape_4_3"` \| `"landscape_16_9"` \| `"auto"`
`num_images`	integer		`1`	Number of images to generate. (≥ 1, ≤ 4)
`output_format`	enum		`"png"`	Output format for the images. Values: `"jpeg"` \| `"png"` \| `"webp"`
`quality`	enum		`"high"`	Quality for the generated image. Use 'auto' to let the model pick the best quality for the prompt. Values: `"auto"` \| `"low"` \| `"medium"` \| `"high"`

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`openai/gpt-image-2-edit`

GPT Image 2 (edit) · image → image (edit)

Image-to-image edit. Quality × size matrix; provide image_url(s).

Inputs

Field	Type	Required	Default	Notes
`image_urls`	array<string>	✓	—	The URLs of the images to use as a reference for the generation
`prompt`	string	✓	—	The prompt for image generation. (max 32,000 chars, min 2 chars)
`image_size`	enum		`"auto"`	The size of the generated image. Use 'auto' to infer from input images. Values: `"square_hd"` \| `"square"` \| `"portrait_4_3"` \| `"portrait_16_9"` \| `"landscape_4_3"` \| `"landscape_16_9"` \| `"auto"`
`mask_url`	string		—	The URL of the mask image to use for the generation. This indicates what part of the image to edit
`num_images`	integer		`1`	Number of images to generate. (≥ 1, ≤ 4)
`output_format`	enum		`"png"`	Output format for the images. Values: `"jpeg"` \| `"png"` \| `"webp"`
`quality`	enum		`"high"`	Quality for the generated image. Use 'auto' to let the model pick the best quality for the prompt. Values: `"auto"` \| `"low"` \| `"medium"` \| `"high"`

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`bytedance/seedream-v4`

Seedream v4 · text → image

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The text prompt used to generate the image
`enable_safety_checker`	boolean		`true`	If set to true, the safety checker will be enabled
`enhance_prompt_mode`	enum		`"standard"`	The mode to use for enhancing prompt enhancement. Standard mode provides higher quality results but takes longer to generate. Fast mode provides average quality results but takes less time to generate. Values: `"standard"` \| `"fast"`
`image_size`	enum		`{height: 2048, width: 2048}`	The size of the generated image. Total pixels must be between 960x960 and 4096x4096. Values: `"square_hd"` \| `"square"` \| `"portrait_4_3"` \| `"portrait_16_9"` \| `"landscape_4_3"` \| `"landscape_16_9"` \| `"auto"` \| `"auto_2K"` \| `"auto_4K"`
`max_images`	integer		`1`	If set to a number greater than one, enables multi-image generation. The model will potentially return up to `max_images` images every generation, and in total, `num_images` generations will be carried out. In total, the number of images generated will be between `num_images` and `max_images*num_images`. (≥ 1, ≤ 6)
`num_images`	integer		`1`	Number of separate model generations to be run with the prompt. (≥ 1, ≤ 6)
`seed`	integer		—	Random seed to control the stochasticity of image generation

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one). Additional metadata available under meta (seed).

`bytedance/seedream-v4-edit`

Seedream v4 (edit) · image → image (edit)

Edit/composite reference images. Provide image_url (or list) plus a prompt.

Inputs

Field	Type	Required	Default	Notes
`image_urls`	array<string>	✓	—	List of URLs of input images for editing. Presently, up to 10 image inputs are allowed. If over 10 images are sent, only the last 10 will be used
`prompt`	string	✓	—	The text prompt used to edit the image
`enable_safety_checker`	boolean		`true`	If set to true, the safety checker will be enabled
`enhance_prompt_mode`	enum		`"standard"`	The mode to use for enhancing prompt enhancement. Standard mode provides higher quality results but takes longer to generate. Fast mode provides average quality results but takes less time to generate. Values: `"standard"` \| `"fast"`
`image_size`	enum		`{height: 2048, width: 2048}`	The size of the generated image. The minimum total image area is 921600 pixels. Failing this, the image size will be adjusted to by scaling it up, while maintaining the aspect ratio. Values: `"square_hd"` \| `"square"` \| `"portrait_4_3"` \| `"portrait_16_9"` \| `"landscape_4_3"` \| `"landscape_16_9"` \| `"auto"` \| `"auto_2K"` \| `"auto_4K"`
`max_images`	integer		`1`	If set to a number greater than one, enables multi-image generation. The model will potentially return up to `max_images` images every generation, and in total, `num_images` generations will be carried out. In total, the number of images generated will be between `num_images` and `max_images*num_images`. The total number of images (image inputs + image outputs) must not exceed 15. (≥ 1, ≤ 6)
`num_images`	integer		`1`	Number of separate model generations to be run with the prompt. (≥ 1, ≤ 6)
`seed`	integer		—	Random seed to control the stochasticity of image generation

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one). Additional metadata available under meta (seed).

`google/imagen-4`

Imagen 4 · text → image

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The text prompt to generate an image from. (max 5,000 chars, min 3 chars)
`aspect_ratio`	enum		`"1:1"`	The aspect ratio of the generated image. Values: `"1:1"` \| `"16:9"` \| `"9:16"` \| `"4:3"` \| `"3:4"`
`num_images`	integer		`1`	The number of images to generate. (≥ 1, ≤ 4)
`output_format`	enum		`"png"`	The format of the generated image. Values: `"jpeg"` \| `"png"` \| `"webp"`
`resolution`	enum		`"1K"`	The resolution of the generated image. Values: `"1K"` \| `"2K"`
`safety_tolerance`	enum		`"4"`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"`
`seed`	integer		—	The seed for the random number generator

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`google/nano-banana-pro`

Nano Banana Pro · text → image

Reasoning image gen with strong multi-language text rendering and native up-to-4K output. Set resolution ∈ 1K|2K|4K.

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The text prompt to generate an image from. (max 50,000 chars, min 3 chars)
`aspect_ratio`	enum		`"1:1"`	The aspect ratio of the generated image. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"3:2"` \| `"4:3"` \| `"5:4"` \| `"1:1"` \| `"4:5"` \| `"3:4"` \| `"2:3"` \| `"9:16"`
`enable_web_search`	boolean		`false`	Enable web search for the image generation task. This will allow the model to use the latest information from the web to generate the image
`limit_generations`	boolean		`false`	Experimental parameter to limit the number of generations from each round of prompting to 1. Set to `True` to to disregard any instructions in the prompt regarding the number of images to generate
`num_images`	integer		`1`	The number of images to generate. (≥ 1, ≤ 4)
`output_format`	enum		`"png"`	The format of the generated image. Values: `"jpeg"` \| `"png"` \| `"webp"`
`resolution`	enum		`"1K"`	The resolution of the image to generate. Values: `"1K"` \| `"2K"` \| `"4K"`
`safety_tolerance`	enum		`"4"`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"`
`seed`	integer		—	The seed for the random number generator
`system_prompt`	string		`""`	Optional system instruction that steers the model's persona and output style across the request. Leave blank to omit; when provided, it is sent as the system instruction to Gemini (or as a system message on OpenAI-compatible providers). (max 50,000 chars)

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`google/nano-banana-pro-edit`

Nano Banana Pro (edit) · image → image (edit)

Edit/compose from up to 14 reference images via image_urls + a prompt. Strong text rendering, native up-to-4K.

Inputs

Field	Type	Required	Default	Notes
`image_urls`	array<string>	✓	—	The URLs of the images to use for image-to-image generation or image editing
`prompt`	string	✓	—	The prompt for image editing. (max 50,000 chars, min 3 chars)
`aspect_ratio`	enum		`"auto"`	The aspect ratio of the generated image. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"3:2"` \| `"4:3"` \| `"5:4"` \| `"1:1"` \| `"4:5"` \| `"3:4"` \| `"2:3"` \| `"9:16"`
`enable_web_search`	boolean		`false`	Enable web search for the image generation task. This will allow the model to use the latest information from the web to generate the image
`limit_generations`	boolean		`false`	Experimental parameter to limit the number of generations from each round of prompting to 1. Set to `True` to to disregard any instructions in the prompt regarding the number of images to generate
`num_images`	integer		`1`	The number of images to generate. (≥ 1, ≤ 4)
`output_format`	enum		`"png"`	The format of the generated image. Values: `"jpeg"` \| `"png"` \| `"webp"`
`resolution`	enum		`"1K"`	The resolution of the image to generate. Values: `"1K"` \| `"2K"` \| `"4K"`
`safety_tolerance`	enum		`"4"`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"`
`seed`	integer		—	The seed for the random number generator
`system_prompt`	string		`""`	Optional system instruction that steers the model's persona and output style across the request. Leave blank to omit; when provided, it is sent as the system instruction to Gemini (or as a system message on OpenAI-compatible providers). (max 50,000 chars)

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`kuaishou/kling-v3-image`

Kling v3 (image) · text → image

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	Text prompt for image generation. Max 2500 characters. (max 2,500 chars)
`aspect_ratio`	enum		`"16:9"`	Aspect ratio of generated images. Values: `"16:9"` \| `"9:16"` \| `"1:1"` \| `"4:3"` \| `"3:4"` \| `"3:2"` \| `"2:3"` \| `"21:9"`
`elements`	array<elementinput>		—	Optional: Elements (characters/objects) to include in the image for face control. Each element can have a frontal image and optionally reference images
`negative_prompt`	string		—	Negative text prompt. It is recommended to supplement negative prompt information through negative sentences directly within positive prompts
`num_images`	integer		`1`	Number of images to generate (1-9). (≥ 1, ≤ 9)
`output_format`	enum		`"png"`	The format of the generated image. Values: `"jpeg"` \| `"png"` \| `"webp"`
`resolution`	enum		`"1K"`	Image generation resolution. 1K: standard, 2K: high-res. Values: `"1K"` \| `"2K"`

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`kuaishou/kling-v3-image-edit`

Kling v3 (image edit) · image → image (edit)

Edit/compose from a base reference image (+ optional extra references). Provide refs.

Inputs

Field	Type	Required	Default	Notes
`image_url`	string	✓	—	Reference image for image-to-image generation.

Max file size: 10.0MB, Min width: 300px, Min height: 300px, Min aspect ratio: 0.40, Max aspect ratio: 2.50, Timeout: 20.0s | | prompt | string | ✓ | — | Text prompt for image generation. Max 2500 characters. (max 2,500 chars) | | aspect_ratio | enum | | "16:9" | Aspect ratio of generated images. Values: "16:9" | "9:16" | "1:1" | "4:3" | "3:4" | "3:2" | "2:3" | "21:9" | | elements | array<elementinput> | | — | Optional: Elements (characters/objects) to include in the image for face control | | num_images | integer | | 1 | Number of images to generate (1-9). (≥ 1, ≤ 9) | | output_format | enum | | "png" | The format of the generated image. Values: "jpeg" | "png" | "webp" | | resolution | enum | | "1K" | Image generation resolution. 1K: standard, 2K: high-res. Values: "1K" | "2K" |

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

Videos

`bytedance/seedance-2-t2v`

Seedance 2.0 — text→video

720p–1080p text-to-video. Audio included. Per-second rate jumps at 1080p.

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The text prompt used to generate the video
`aspect_ratio`	enum		`"auto"`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"`
`bitrate_mode`	enum		`"standard"`	Output bitrate mode. 'high' requests a higher-quality, larger-file encode from the model; 'standard' uses the default bitrate. Values: `"standard"` \| `"high"`
`duration`	enum		`"auto"`	Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"`
`generate_audio`	boolean		`true`	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not
`resolution`	enum		`"720p"`	Video resolution - 480p for faster generation, 720p for balance, 1080p for high quality, 4k for highest quality. Values: `"480p"` \| `"720p"` \| `"1080p"` \| `"4k"`

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one). Additional metadata available under meta (seed).

`bytedance/seedance-2-i2v`

Seedance 2.0 — image→video

Inputs

Field	Type	Required	Default	Notes
`image_url`	string	✓	—	The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB
`prompt`	string	✓	—	The text prompt describing the desired motion and action for the video
`aspect_ratio`	enum		`"auto"`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"`
`bitrate_mode`	enum		`"standard"`	Output bitrate mode. 'high' requests a higher-quality, larger-file encode from the model; 'standard' uses the default bitrate. Values: `"standard"` \| `"high"`
`duration`	enum		`"auto"`	Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"`
`end_image_url`	string		—	The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB
`generate_audio`	boolean		`true`	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not
`resolution`	enum		`"720p"`	Video resolution - 480p for faster generation, 720p for balance, 1080p for high quality, 4k for highest quality. Values: `"480p"` \| `"720p"` \| `"1080p"` \| `"4k"`

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one). Additional metadata available under meta (seed).

`bytedance/seedance-2-r2v`

Seedance 2.0 — reference→video

Up to 9 image / 3 video / 3 audio references. Per-second drops 40% when a video reference is passed.

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The text prompt used to generate the video
`aspect_ratio`	enum		`"auto"`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"`
`audio_urls`	array<string>		—	Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required
`bitrate_mode`	enum		`"standard"`	Output bitrate mode. 'high' requests a higher-quality, larger-file encode from the model; 'standard' uses the default bitrate. Values: `"standard"` \| `"high"`
`duration`	enum		`"auto"`	Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"`
`generate_audio`	boolean		`true`	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not
`image_urls`	array<string>		—	Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12
`resolution`	enum		`"720p"`	Video resolution - 480p for faster generation, 720p for balance, 1080p for high quality, 4k for highest quality. Values: `"480p"` \| `"720p"` \| `"1080p"` \| `"4k"`
`video_urls`	array<string>		—	Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one). Additional metadata available under meta (seed).

`bytedance/seedance-2-fast-t2v`

Seedance 2.0 Fast — text→video

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The text prompt used to generate the video
`aspect_ratio`	enum		`"auto"`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"`
`bitrate_mode`	enum		`"standard"`	Output bitrate mode. 'high' requests a higher-quality, larger-file encode from the model; 'standard' uses the default bitrate. Values: `"standard"` \| `"high"`
`duration`	enum		`"auto"`	Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"`
`generate_audio`	boolean		`true`	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not
`resolution`	enum		`"720p"`	Video resolution - 480p for faster generation, 720p for balance. Values: `"480p"` \| `"720p"`

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one). Additional metadata available under meta (seed).

`bytedance/seedance-2-fast-i2v`

Seedance 2.0 Fast — image→video

Inputs

Field	Type	Required	Default	Notes
`image_url`	string	✓	—	The URL of the starting frame image to animate. Supported formats: JPEG, PNG, WebP. Max 30 MB
`prompt`	string	✓	—	The text prompt describing the desired motion and action for the video
`aspect_ratio`	enum		`"auto"`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to infer from the input image. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"`
`bitrate_mode`	enum		`"standard"`	Output bitrate mode. 'high' requests a higher-quality, larger-file encode from the model; 'standard' uses the default bitrate. Values: `"standard"` \| `"high"`
`duration`	enum		`"auto"`	Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"`
`end_image_url`	string		—	The URL of the image to use as the last frame of the video. When provided, the generated video will transition from the starting image to this ending image. Supported formats: JPEG, PNG, WebP. Max 30 MB
`generate_audio`	boolean		`true`	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not
`resolution`	enum		`"720p"`	Video resolution - 480p for faster generation, 720p for balance. Values: `"480p"` \| `"720p"`

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one). Additional metadata available under meta (seed).

`bytedance/seedance-2-fast-r2v`

Seedance 2.0 Fast — reference→video

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The text prompt used to generate the video
`aspect_ratio`	enum		`"auto"`	The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. Values: `"auto"` \| `"21:9"` \| `"16:9"` \| `"4:3"` \| `"1:1"` \| `"3:4"` \| `"9:16"`
`audio_urls`	array<string>		—	Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file.If audio is provided, at least one reference image or video is required
`bitrate_mode`	enum		`"standard"`	Output bitrate mode. 'high' requests a higher-quality, larger-file encode from the model; 'standard' uses the default bitrate. Values: `"standard"` \| `"high"`
`duration`	enum		`"auto"`	Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. Values: `"auto"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"`
`generate_audio`	boolean		`true`	Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not
`image_urls`	array<string>		—	Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12
`resolution`	enum		`"720p"`	Video resolution - 480p for faster generation, 720p for balance. Values: `"480p"` \| `"720p"`
`video_urls`	array<string>		—	Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one). Additional metadata available under meta (seed).

`kuaishou/kling-v3-t2v`

Kling v3 Pro — text→video

Set generate_audio: true to enable audio, voice_control: true for voice.

Inputs

Field	Type	Required	Default	Notes
`aspect_ratio`	enum		`"16:9"`	The aspect ratio of the generated video frame. Values: `"16:9"` \| `"9:16"` \| `"1:1"`
`cfg_scale`	number		`0.5`	The CFG (Classifier Free Guidance) scale is a measure of how close you want

        the model to stick to your prompt. (≥ 0, ≤ 1) |

| duration | enum | | "5" | The duration of the generated video in seconds. Values: "3" | "4" | "5" | "6" | "7" | "8" | "9" | "10" | "11" | "12" | "13" | "14" | "15" | | generate_audio | boolean | | true | Whether to generate native audio for the video. Supports Chinese and English voice output. Other languages are automatically translated to English. For English speech, use lowercase letters; for acronyms or proper nouns, use uppercase | | multi_prompt | array<klingv3multipromptelement> | | — | List of prompts for multi-shot video generation. If provided, overrides the single prompt and divides the video into multiple shots with specified prompts and durations | | negative_prompt | string | | "blur, distort, and low quality" | (max 2,500 chars) | | prompt | string | | — | Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both | | shot_type | enum | | "customize" | The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. Values: "customize" | "intelligent" |

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`kuaishou/kling-v3-i2v`

Kling v3 Pro — image→video

Inputs

Field	Type	Required	Default	Notes
`start_image_url`	string	✓	—	URL of the image to be used for the video
`cfg_scale`	number		`0.5`	The CFG (Classifier Free Guidance) scale is a measure of how close you want

        the model to stick to your prompt. (≥ 0, ≤ 1) |

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`kuaishou/kling-o3-r2v`

Kling O3 Pro — reference→video

Reference-to-video on the Kling O3 line. Pass refs as image_urls; default 8s.

Inputs

Field	Type	Default	Notes
`aspect_ratio`	enum	`"16:9"`	The aspect ratio of the generated video frame. Values: `"16:9"` \| `"9:16"` \| `"1:1"`
`duration`	enum	`"5"`	Video duration in seconds (3-15s). Values: `"3"` \| `"4"` \| `"5"` \| `"6"` \| `"7"` \| `"8"` \| `"9"` \| `"10"` \| `"11"` \| `"12"` \| `"13"` \| `"14"` \| `"15"`
`elements`	array<klingv3comboelementinput>	—	Elements (characters/objects) to include. Reference in prompt as @Element1, @Element2
`end_image_url`	string	—	Image to use as the last frame of the video
`generate_audio`	boolean	`false`	Whether to generate native audio for the video
`image_urls`	array<string>	—	Reference images for style/appearance. Reference in prompt as @Image1, @Image2, etc. Maximum 4 total (elements + reference images) when using video
`multi_prompt`	array<klingv3multipromptelement>	—	List of prompts for multi-shot video generation
`prompt`	string	—	Text prompt for video generation. Either prompt or multi_prompt must be provided, but not both
`shot_type`	enum	`"customize"`	The type of multi-shot video generation. 'intelligent' lets the model automatically determine shot structure. Values: `"customize"` \| `"intelligent"`
`start_image_url`	string	—	Image to use as the first frame of the video

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`kuaishou/kling-avatar-v2`

Kling AI Avatar v2 Pro — lip-sync · video

Lip-synced talking-head from a portrait image_url + an audio_url. Output duration auto-matches the audio; framing follows the image. Optional prompt for delivery/expression. Billed per second of output.

Inputs

Field	Type	Required	Default	Notes
`audio_url`	string	✓	—	The URL of the audio file
`image_url`	string	✓	—	The URL of the image to use as your avatar
`prompt`	string		`"."`	The prompt to use for the video generation

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`google/veo-3-t2v`

Veo 3 — text→video

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The text prompt describing the video you want to generate. (max 20,000 chars)
`aspect_ratio`	enum		`"16:9"`	The aspect ratio of the generated video. Values: `"16:9"` \| `"9:16"`
`auto_fix`	boolean		`true`	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them
`duration`	enum		`"8s"`	The duration of the generated video. Values: `"4s"` \| `"6s"` \| `"8s"`
`generate_audio`	boolean		`true`	Whether to generate audio for the video
`negative_prompt`	string		—	A negative prompt to guide the video generation
`resolution`	enum		`"720p"`	The resolution of the generated video. Values: `"720p"` \| `"1080p"`
`safety_tolerance`	enum		`"4"`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"`
`seed`	integer		—	The seed for the random number generator

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`google/veo-3-i2v`

Veo 3 — image→video

Inputs

Field	Type	Required	Default	Notes
`image_url`	string	✓	—	URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit
`prompt`	string	✓	—	The text prompt describing how the image should be animated. (max 20,000 chars)
`aspect_ratio`	enum		`"auto"`	The aspect ratio of the generated video. Values: `"auto"` \| `"16:9"` \| `"9:16"`
`auto_fix`	boolean		`false`	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them
`duration`	enum		`"8s"`	The duration of the generated video. Values: `"4s"` \| `"6s"` \| `"8s"`
`generate_audio`	boolean		`true`	Whether to generate audio for the video
`negative_prompt`	string		—	A negative prompt to guide the video generation
`resolution`	enum		`"720p"`	The resolution of the generated video. Values: `"720p"` \| `"1080p"`
`safety_tolerance`	enum		`"4"`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"`
`seed`	integer		—	The seed for the random number generator

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`google/veo-3-fast-t2v`

Veo 3 Fast — text→video

Inputs

Field	Type	Required	Default	Notes
`prompt`	string	✓	—	The text prompt describing the video you want to generate. (max 20,000 chars)
`aspect_ratio`	enum		`"16:9"`	The aspect ratio of the generated video. Values: `"16:9"` \| `"9:16"`
`auto_fix`	boolean		`true`	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them
`duration`	enum		`"8s"`	The duration of the generated video. Values: `"4s"` \| `"6s"` \| `"8s"`
`generate_audio`	boolean		`true`	Whether to generate audio for the video
`negative_prompt`	string		—	A negative prompt to guide the video generation
`resolution`	enum		`"720p"`	The resolution of the generated video. Values: `"720p"` \| `"1080p"`
`safety_tolerance`	enum		`"4"`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"`
`seed`	integer		—	The seed for the random number generator

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`google/veo-3-fast-i2v`

Veo 3 Fast — image→video

Inputs

Field	Type	Required	Default	Notes
`image_url`	string	✓	—	URL of the input image to animate. Should be 720p or higher resolution in 16:9 or 9:16 aspect ratio. If the image is not in 16:9 or 9:16 aspect ratio, it will be cropped to fit
`prompt`	string	✓	—	The text prompt describing how the image should be animated. (max 20,000 chars)
`aspect_ratio`	enum		`"auto"`	The aspect ratio of the generated video. Values: `"auto"` \| `"16:9"` \| `"9:16"`
`auto_fix`	boolean		`false`	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them
`duration`	enum		`"8s"`	The duration of the generated video. Values: `"4s"` \| `"6s"` \| `"8s"`
`generate_audio`	boolean		`true`	Whether to generate audio for the video
`negative_prompt`	string		—	A negative prompt to guide the video generation
`resolution`	enum		`"720p"`	The resolution of the generated video. Values: `"720p"` \| `"1080p"`
`safety_tolerance`	enum		`"4"`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"`
`seed`	integer		—	The seed for the random number generator

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`google/veo-3.1-r2v`

Veo 3.1 — reference→video

Reference-to-video on Veo 3.1. Pass reference images as image_urls.

Inputs

Field	Type	Required	Default	Notes
`image_urls`	array<string>	✓	—	URLs of the reference images to use for consistent subject appearance
`prompt`	string	✓	—	The text prompt describing the video you want to generate. (max 20,000 chars)
`aspect_ratio`	enum		`"16:9"`	The aspect ratio of the generated video. Values: `"16:9"` \| `"9:16"`
`auto_fix`	boolean		`false`	Whether to automatically attempt to fix prompts that fail content policy or other validation checks by rewriting them
`duration`	string		`"8s"`	The duration of the generated video
`generate_audio`	boolean		`true`	Whether to generate audio for the video
`resolution`	enum		`"720p"`	The resolution of the generated video. Values: `"720p"` \| `"1080p"` \| `"4k"`
`safety_tolerance`	enum		`"4"`	The safety tolerance level for content moderation. 1 is the most strict (blocks most content), 6 is the least strict. Values: `"1"` \| `"2"` \| `"3"` \| `"4"` \| `"5"` \| `"6"`

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

Audio

`elevenlabs/scribe-v2`

ElevenLabs Scribe v2 — speech→text

Speech-to-text with word-level timestamps. Pass audio_url; the call returns text + words (each with start/end seconds) instead of a file. Pass keyterms (brand / proper nouns) to bias the transcription. Billed per second of audio.

Inputs

Field	Type	Required	Default	Notes
`audio_url`	string	✓	—	URL of the audio file to transcribe
`diarize`	boolean		`true`	Whether to annotate who is speaking
`keyterms`	array<string>		`[]`	Words or sentences to bias the model towards transcribing. Up to 100 keyterms, max 50 characters each. Adds 30% premium over base transcription price
`language_code`	string		—	Language code of the audio
`tag_audio_events`	boolean		`true`	Tag audio events like laughter, applause, etc

Output

Returns the transcript inline as text plus a words array (each word carries start/end seconds and type) and a detected language_code. No file is produced, so drive_path is empty.

`elevenlabs/tts-v3`

ElevenLabs v3 — expressive text→speech

Expressive multilingual text-to-speech (74 languages) that reads inline audio tags — wrap a cue in square brackets ([excited], [whispers], [laughs], [sighs], [sarcastic], [British accent]) and v3 shapes the read accordingly; the tag is acted, never spoken. Pass text, an optional voice (preset name like "Aria"/"Roger" or a voice id), and stability (0.0 = most expressive and tag-responsive, 0.5 = balanced, 1.0 = steadiest/least tag-responsive). v3 tunes delivery through tags + stability, so it ignores the style/speed/similarity_boost knobs — pick elevenlabs/tts-multilingual-v2 when you need those instead. Billed per character of text.

Inputs

Field	Type	Required	Default	Notes
`text`	string	✓	—	The text to convert to speech. (max 5,000 chars, min 1 chars)
`apply_text_normalization`	enum		`"auto"`	This parameter controls text normalization with three modes: 'auto', 'on', and 'off'. When set to 'auto', the system will automatically decide whether to apply text normalization (e.g., spelling out numbers). With 'on', text normalization will always be applied, while with 'off', it will be skipped. Values: `"auto"` \| `"on"` \| `"off"`
`language_code`	string		—	Language code (ISO 639-1) used to enforce a language for the model
`stability`	number		`0.5`	Voice stability (0-1). (≥ 0, ≤ 1)
`timestamps`	boolean		`false`	Whether to return timestamps for each word in the generated speech
`voice`	string		`"Rachel"`	The voice to use for speech generation

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).

`elevenlabs/tts-multilingual-v2`

ElevenLabs Multilingual v2 — text→speech

Natural multilingual text-to-speech (29 languages) with fine delivery controls — stability, style, speed, similarity_boost. Does not interpret audio tags (use elevenlabs/tts-v3 for [excited]/[whispers]-style direction). Pass text (and an optional voice). Billed per character of text.

Inputs

Field	Type	Required	Default	Notes
`text`	string	✓	—	The text to convert to speech. (min 1 chars)
`apply_text_normalization`	enum		`"auto"`	This parameter controls text normalization with three modes: 'auto', 'on', and 'off'. When set to 'auto', the system will automatically decide whether to apply text normalization (e.g., spelling out numbers). With 'on', text normalization will always be applied, while with 'off', it will be skipped. Values: `"auto"` \| `"on"` \| `"off"`
`language_code`	string		—	Language code (ISO 639-1) used to enforce a language for the model. An error will be returned if language code is not supported by the model
`next_text`	string		—	The text that comes after the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation
`previous_text`	string		—	The text that came before the text of the current request. Can be used to improve the speech's continuity when concatenating together multiple generations or to influence the speech's continuity in the current generation
`similarity_boost`	number		`0.75`	Similarity boost (0-1). (≥ 0, ≤ 1)
`speed`	number		`1`	Speech speed (0.7-1.2). Values below 1.0 slow down the speech, above 1.0 speed it up. Extreme values may affect quality. (≥ 0.7, ≤ 1.2)
`stability`	number		`0.5`	Voice stability (0-1). (≥ 0, ≤ 1)
`style`	number		`0`	Style exaggeration (0-1). (≥ 0, ≤ 1)
`timestamps`	boolean		`false`	Whether to return timestamps for each word in the generated speech
`voice`	string		`"Rachel"`	The voice to use for speech generation

Output

Saved to your project drive at drive_path (mint a URL with puras.drive.url / the drive_url tool if you need one).