Video Generation
Kling Series
Kuaishou Kling video generation — v2.6 / v3 / v3-omni / video-o1, four generations covering text-to-video, image-to-video, first/last frame, multimodal reference, and reasoning-enhanced
POST
Kling Series
Kuaishou Kling video generation series. Four-generation lineup:Documentation Index
Fetch the complete documentation index at: https://docs.qingbo.dev/llms.txt
Use this file to discover all available pages before exploring further.
- kling-v2.6 — Classic stable release, 720P silent / 1080P with optional audio, further refined semantic adherence and motion stability
- kling-v3 — Next-generation base model, adds 4K resolution, T2V duration extended to 15 seconds, native audio support
- kling-v3-omni — Unified multimodal interface for the v3 family — T2V, I2V, and video reference all share one endpoint, with
<<<image_N>>>reference syntax in the prompt - kling-video-o1 — First reasoning-enhanced video model, performs deep planning over the prompt and reference assets before generation, delivering best-in-class physical consistency, complex motion, and long-form motion semantic adherence
$/second, with resolution selecting 720p / 1080p / 4K.
Pricing
| Model | 720P | 1080P | 4K | Notes |
|---|---|---|---|---|
kling-v2.6 | $0.0391 | $0.06641 | — | 1080P with audio uplifts to $0.159375 |
kling-v3 | $0.0714 | $0.0952 | $0.455345 | With audio: 720P $0.1071 / 1080P $0.1428 / 4K same price |
kling-v3-omni | $0.0714 | $0.0952 | $0.455345 | With audio: 720P $0.0952 / 1080P $0.119; video reference: 720P $0.1071 / 1080P $0.1428 |
kling-video-o1 | $0.0714 | $0.0952 | — | Video reference: 720P $0.1071 / 1080P $0.1428 |
Prices are per second; actual charge = unit price ×
duration. Enabling audio or attaching a video_list video reference switches to the corresponding higher tier.Examples
GET /v1/tasks/{task_id}; see Task System for details.
Available Models
| Model ID | Resolution | Duration | Supported actions | Highlights |
|---|---|---|---|---|
kling-video-o1 | 720P / 1080P | 5 / 10 sec | generate · image2video · first_last_frame · reference · reference_video | Reasoning-enhanced, best physical consistency |
kling-v3-omni | 720P / 1080P / 4K | 3-15 sec | generate · image2video · reference · reference_video | Unified multimodal endpoint |
kling-v3 | 720P / 1080P / 4K | 3-15 sec | generate · image2video · first_last_frame | 4K + native audio |
kling-v2.6 | 720P / 1080P | 5 / 10 sec | generate · image2video · first_last_frame | 1080P with optional audio, top stability |
Common Parameters
Model ID; see the Available Models table
Operation type. Allowed values:
generate— text-to-video (T2V)image2video— image-to-video; pair withfirst_frame_imageorimage_urlsfirst_last_frame— first/last frame interpolation; requiresfirst_frame_image+last_frame_image(v2.6 / v3 / o1)reference— multimodal reference-to-video; pair withimage_urls(omni / o1)reference_video— video reference-to-video; pair withvideo_list(omni / o1)
Video description text. In Omni / O1 you can use the
<<<image_N>>> syntax to reference the N-th image in image_urls (N starts at 1)Frame aspect ratio, applies to T2V only; image / video reference modes follow the source asset’s ratio. Allowed values:
16:9— landscape widescreen9:16— portrait1:1— square
Output resolution. See Available Models for per-model support:
720p1080p4K(v3 / omni only)
Video duration in seconds. v2.6 / o1 accept only
5 or 10; v3 / omni accept any integer from 3-15Reference image URL array. A single-element array triggers I2V; multiple images feed the multimodal reference for Omni / O1 (referenceable via
<<<image_N>>>)First-frame image URL. Used with
image2video or first_last_frame action (v2.6 / v3 / o1)Last-frame image URL, used with
first_last_frame actionReference video URL array (single element). Simplified form for Omni / O1, equivalent to attaching a
video_list entry with refer_type=featureWebhook callback URL, invoked when the task reaches a terminal state. See Callback Mechanism
Event types to push; defaults to all terminal events
Model-specific Parameters
- kling-video-o1
- kling-v3-omni
- kling-v3
- kling-v2.6
Reference video list, at most 1 clip. Same structure as Omni:
video_url— video URLrefer_type—base/featurekeep_original_sound—yes/no
audio field; for audio output use v3 / omni.Special Syntax
<<<image_N>>> prompt image reference (Omni / O1)
Within prompt, use the <<<image_N>>> placeholder to explicitly reference the N-th image in image_urls (N starts at 1). The model substitutes the placeholder with the corresponding image content for:
- Character consistency —
<<<image_1>>>locks the subject’s appearance - Scene composition — the character from
<<<image_1>>>appears in the environment of<<<image_2>>> - Multi-asset guidance — embed N reference images at any position
Example
image_urls:
Non-Omni / O1 models (v2.6 / v3) do not parse
<<<image_N>>> placeholders and forward them to the model as literal text.Resource Limits
| Item | Limit |
|---|---|
| Reference image (per file) | ≤ 30MB, JPG / PNG / WEBP |
| Reference image (count) | I2V: 1; Omni / O1 multimodal: ≤ 4 recommended |
Reference video (video_list) | MP4 / MOV, ≤ 100MB, 2-30 sec, at most 1 clip |
| Prompt | ≤ 2500 characters |
| Output | MP4, link valid for 24 hours |
Related Docs
- Task System Reference — task state machine / polling cadence / async push
- Request & Response Format — common error codes / headers / rate limits
- Authentication — API key signup and usage
Kling Series