Skip to main content
POST
/
v1
/
tasks
Kling Series
curl --request POST \
  --url https://www.qingbo.dev/v1/tasks \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "action": "<string>",
  "prompt": "<string>",
  "aspect_ratio": "<string>",
  "resolution": "<string>",
  "duration": 123,
  "image_urls": [
    "<string>"
  ],
  "first_frame_image": "<string>",
  "last_frame_image": "<string>",
  "video_urls": [
    "<string>"
  ],
  "callback_url": "<string>",
  "callback_events": [
    "<string>"
  ]
}
'
{
  "task_id": "task-wave1775285160b950328499",
  "model": "kling-v3",
  "action": "generate",
  "status": "queued",
  "created_at": 1775285160040,
  "progress": 0
}

Documentation Index

Fetch the complete documentation index at: https://docs.qingbo.dev/llms.txt

Use this file to discover all available pages before exploring further.

Kuaishou Kling video generation series. Four-generation lineup:
  • kling-v2.6 — Classic stable release, 720P silent / 1080P with optional audio, further refined semantic adherence and motion stability
  • kling-v3 — Next-generation base model, adds 4K resolution, T2V duration extended to 15 seconds, native audio support
  • kling-v3-omni — Unified multimodal interface for the v3 family — T2V, I2V, and video reference all share one endpoint, with <<<image_N>>> reference syntax in the prompt
  • kling-video-o1 — First reasoning-enhanced video model, performs deep planning over the prompt and reference assets before generation, delivering best-in-class physical consistency, complex motion, and long-form motion semantic adherence
Billed by resolution, in $/second, with resolution selecting 720p / 1080p / 4K.

Pricing

Model720P1080P4KNotes
kling-v2.6$0.0391$0.066411080P with audio uplifts to $0.159375
kling-v3$0.0714$0.0952$0.455345With audio: 720P $0.1071 / 1080P $0.1428 / 4K same price
kling-v3-omni$0.0714$0.0952$0.455345With audio: 720P $0.0952 / 1080P $0.119; video reference: 720P $0.1071 / 1080P $0.1428
kling-video-o1$0.0714$0.0952Video reference: 720P $0.1071 / 1080P $0.1428
Prices are per second; actual charge = unit price × duration. Enabling audio or attaching a video_list video reference switches to the corresponding higher tier.

Examples

curl -X POST https://www.qingbo.dev/v1/tasks \
  -H "Authorization: Bearer $WAVE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kling-v3",
    "prompt": "A Shiba Inu in a spacesuit walking on the moon, cinematic lighting",
    "duration": 5,
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "audio": true
  }'
{
  "task_id": "task-wave1775285160b950328499",
  "model": "kling-v3",
  "action": "generate",
  "status": "queued",
  "created_at": 1775285160040,
  "progress": 0
}
After submission, poll status with GET /v1/tasks/{task_id}; see Task System for details.

Available Models

Model IDResolutionDurationSupported actionsHighlights
kling-video-o1720P / 1080P5 / 10 secgenerate · image2video · first_last_frame · reference · reference_videoReasoning-enhanced, best physical consistency
kling-v3-omni720P / 1080P / 4K3-15 secgenerate · image2video · reference · reference_videoUnified multimodal endpoint
kling-v3720P / 1080P / 4K3-15 secgenerate · image2video · first_last_frame4K + native audio
kling-v2.6720P / 1080P5 / 10 secgenerate · image2video · first_last_frame1080P with optional audio, top stability

Common Parameters

model
string
required
Model ID; see the Available Models table
action
string
default:"generate"
Operation type. Allowed values:
  • generate — text-to-video (T2V)
  • image2video — image-to-video; pair with first_frame_image or image_urls
  • first_last_frame — first/last frame interpolation; requires first_frame_image + last_frame_image (v2.6 / v3 / o1)
  • reference — multimodal reference-to-video; pair with image_urls (omni / o1)
  • reference_video — video reference-to-video; pair with video_list (omni / o1)
prompt
string
required
Video description text. In Omni / O1 you can use the <<<image_N>>> syntax to reference the N-th image in image_urls (N starts at 1)
aspect_ratio
string
default:"16:9"
Frame aspect ratio, applies to T2V only; image / video reference modes follow the source asset’s ratio. Allowed values:
  • 16:9 — landscape widescreen
  • 9:16 — portrait
  • 1:1 — square
resolution
string
default:"720p"
Output resolution. See Available Models for per-model support:
  • 720p
  • 1080p
  • 4K (v3 / omni only)
duration
integer
default:"5"
Video duration in seconds. v2.6 / o1 accept only 5 or 10; v3 / omni accept any integer from 3-15
image_urls
string[]
Reference image URL array. A single-element array triggers I2V; multiple images feed the multimodal reference for Omni / O1 (referenceable via <<<image_N>>>)
first_frame_image
string
First-frame image URL. Used with image2video or first_last_frame action (v2.6 / v3 / o1)
last_frame_image
string
Last-frame image URL, used with first_last_frame action
video_urls
string[]
Reference video URL array (single element). Simplified form for Omni / O1, equivalent to attaching a video_list entry with refer_type=feature
callback_url
string
Webhook callback URL, invoked when the task reaches a terminal state. See Callback Mechanism
callback_events
string[]
Event types to push; defaults to all terminal events

Model-specific Parameters

video_list
object[]
Reference video list, at most 1 clip. Same structure as Omni:
  • video_url — video URL
  • refer_typebase / feature
  • keep_original_soundyes / no
O1 does not support the audio field; for audio output use v3 / omni.

Special Syntax

<<<image_N>>> prompt image reference (Omni / O1) Within prompt, use the <<<image_N>>> placeholder to explicitly reference the N-th image in image_urls (N starts at 1). The model substitutes the placeholder with the corresponding image content for:
  • Character consistency — <<<image_1>>> locks the subject’s appearance
  • Scene composition — the character from <<<image_1>>> appears in the environment of <<<image_2>>>
  • Multi-asset guidance — embed N reference images at any position
Example
"Take the girl in red from <<<image_1>>> and place her into the snow-mountain scene of <<<image_2>>>, camera slowly zooming in"
Corresponding image_urls:
"image_urls": [
  "https://cdn.example.com/girl-red.jpg",
  "https://cdn.example.com/snow-mountain.jpg"
]
Non-Omni / O1 models (v2.6 / v3) do not parse <<<image_N>>> placeholders and forward them to the model as literal text.

Resource Limits

ItemLimit
Reference image (per file)≤ 30MB, JPG / PNG / WEBP
Reference image (count)I2V: 1; Omni / O1 multimodal: ≤ 4 recommended
Reference video (video_list)MP4 / MOV, ≤ 100MB, 2-30 sec, at most 1 clip
Prompt≤ 2500 characters
OutputMP4, link valid for 24 hours