Skip to main content
POST
/
v1
/
tasks
SkyReels Series
curl --request POST \
  --url https://www.qingbo.dev/v1/tasks \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "action": "<string>",
  "prompt": "<string>",
  "aspect_ratio": "<string>",
  "resolution": "<string>",
  "duration": 123,
  "image_urls": [
    "<string>"
  ],
  "first_frame_image": "<string>",
  "last_frame_image": "<string>",
  "video_urls": [
    "<string>"
  ],
  "audio_urls": [
    "<string>"
  ],
  "callback_url": "<string>",
  "callback_events": [
    "<string>"
  ]
}
'
{
  "task_id": "task-wave1775285160b950328499",
  "model": "skyreels-v4-fast",
  "action": "generate",
  "status": "queued",
  "created_at": 1775285160040,
  "progress": 0
}

Documentation Index

Fetch the complete documentation index at: https://docs.qingbo.dev/llms.txt

Use this file to discover all available pages before exploring further.

Kunlun Wanwei SkyReels V4 video generation series. V4 is the multimodal release of SkyReels. Beyond T2V / I2V, it natively supports Omni multi-asset references (image / video / audio), video extension, first/last frame & mid-frame keyframes, voiceprint sync, and other advanced capabilities. Duration is any integer in 3-15 seconds. Billed by resolution × duration, with two tiers — Fast / Std:
TierPositioningSpeedQualityRecommended Scenarios
skyreels-v4-fastFast tierQuickStandardCreative previews, batch drafts, A/B comparisons
skyreels-v4-stdStandard quality tierSlower (~1.5-2x)More stableFinal delivery, client-facing cuts, complex motion shots
Both tiers share the same parameters and capabilities (actions / ref_images / ref_videos / mid_frame_images / voiceprint) — they only differ in sampling depth. Recommended workflow: prototype with fast, render the final cut with std.

Pricing

Billed per second ($ / sec). When video reference is enabled (ref_videos containing extend or reference types), a separate unit price applies:

skyreels-v4-fast

ResolutionStandardWith Video Reference5-second video (standard)
480P$0.068 / sec$0.1275 / sec$0.34
720P$0.0935 / sec$0.17 / sec$0.4675
1080P$0.23375 / sec$0.425 / sec$1.16875

skyreels-v4-std

ResolutionStandardWith Video Reference5-second video (standard)
480P$0.0935 / sec$0.153 / sec$0.4675
720P$0.119 / sec$0.2125 / sec$0.595
1080P$0.2975 / sec$0.53125 / sec$1.4875
Video reference surcharge — when ref_videos contains any item of type reference (video reference) or extend (video extension), pricing follows the “With Video Reference” column; otherwise the standard column.

Examples

curl -X POST https://www.qingbo.dev/v1/tasks \
  -H "Authorization: Bearer $WAVE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "skyreels-v4-fast",
    "prompt": "A Shiba Inu in a spacesuit walking on the moon, slow camera push-in, cinematic lighting",
    "duration": 5,
    "resolution": "1080p",
    "aspect_ratio": "16:9"
  }'
{
  "task_id": "task-wave1775285160b950328499",
  "model": "skyreels-v4-fast",
  "action": "generate",
  "status": "queued",
  "created_at": 1775285160040,
  "progress": 0
}
After submission, poll status with GET /v1/tasks/{task_id}. See Task System for details.

Mode Cheat Sheet

Switch generation modes via the action field (or the corresponding media fields):
ModeactionKey FieldsTypical Use
Text-to-videogeneratepromptPure text generation
Image-to-videoimage2videofirst_frame_imageAnimate a single frame
First/last frame inpaintingfirst_last_framefirst_frame_image + last_frame_imagePrecisely control start/end frames
Multi-image reference (Omni)referenceref_imagesCharacter / style / scene consistency
Video referencereference_videoref_videos.type=referenceReplicate camera motion / style
Video extensionextendref_videos.type=extendContinue an existing video
Audio-drivenreference_audioaudio_urls or ref_images[].audio_urlVoiceprint / beat sync, lip-sync
Usually no need to pass action explicitly — the backend auto-routes based on the media fields you supply. Pass it explicitly to lock the mode and avoid ambiguity (e.g. force video reference when both images and videos are present).

Available Models

Model IDTierResolutionDurationDescription
skyreels-v4-fastFast tier480P / 720P / 1080P3-15 secFull T2V/I2V/Omni capability, faster response
skyreels-v4-stdStandard quality tier480P / 720P / 1080P3-15 secSame capability as fast, more stable quality, suited for final delivery

Common Parameters

model
string
required
Model ID. Allowed values:
  • skyreels-v4-fast — fast tier
  • skyreels-v4-std — standard quality tier
action
string
default:"generate"
Operation type. The backend usually auto-routes based on media fields; passing explicitly locks the mode. Allowed values:
  • generate — text-to-video (T2V)
  • image2video — image-to-video (I2V), with first_frame_image
  • first_last_frame — first/last frame inpainting, with first_frame_image + last_frame_image
  • reference — multi-image reference (Omni style / character consistency), with ref_images
  • reference_video — video reference, with ref_videos.type=reference
  • reference_audio — audio-driven, with audio_urls or ref_images[].audio_url
  • extend — video extension, with ref_videos.type=extend
prompt
string
Video description, up to 5000 characters. Required for T2V; optional as guidance for other modes
aspect_ratio
string
default:"16:9"
Frame aspect ratio. Effective in T2V; modes containing media assets follow the source media’s ratio. Allowed values:
  • 16:9 — landscape widescreen
  • 9:16 — portrait tall
  • 4:3 — landscape
  • 3:4 — portrait
  • 1:1 — square
resolution
string
default:"720p"
Output resolution. Allowed values:
  • 480p
  • 720p
  • 1080p
duration
integer
default:"5"
Video duration in seconds, any integer in 3-15
image_urls
string[]
Reference image URL array (simplified mode), one of image_urls or ref_images. For complex cases (need to specify tag / type=image|grid / bind audio_url) use ref_images
first_frame_image
string
First-frame image URL, triggers I2V mode
last_frame_image
string
Last-frame image URL. When passed alongside first_frame_image, triggers first/last frame inpainting
video_urls
string[]
Reference video URL array (simplified mode), one of video_urls or ref_videos. For complex cases (need to specify tag / type=reference|extend) use ref_videos
audio_urls
string[]
Reference audio URL array (simplified mode), triggers voiceprint sync (reference_audio mode), one of audio_urls or the audio_url field inside ref_images. For complex cases (binding audio to a specific asset, linked with image / style group) use ref_images[].audio_url
callback_url
string
Webhook callback URL, invoked when the task reaches a terminal state. See Callbacks
callback_events
string[]
Callback event subscription list. By default subscribes to terminal states (succeeded / failed). Allowed values:
  • queued — enqueued
  • running — execution started
  • succeeded — success (default)
  • failed — failure (default)

Model-Specific Parameters

Both tiers share identical parameters; they differ only in generation speed and quality. Choose by scenario.
Positioning: fast tier, quicker response, suited for previews / drafts / high-throughput scenarios.
ref_images
array
Omni multi-image reference, up to 6 items. Each item supports image (single image) or grid (multi-image composite), and may include audio_url as the driving audio. See Composite Field Reference below
ref_videos
array
Video reference, up to 1 item. type=reference for style / character reference; type=extend for video continuation. See Composite Field Reference below
mid_frame_images
array
Mid-frame keyframes, up to 6 items, with timestamps. Combine with first_frame_image / last_frame_image to precisely control motion trajectory. See Composite Field Reference below
prompt_optimizer
boolean
default:"true"
Whether to enable intelligent prompt rewriting. When on, the backend expands prompt details to improve generation stability; when off, the original prompt is followed strictly

Composite Field Reference

SkyReels V4’s multimodal capabilities are expressed via three array fields, with semantics more precise than the generic image_urls / video_urls. Recommended for complex scenarios.

ref_images — Omni multi-asset reference

Reference asset list, up to 6 items. Each item can be a single image or a multi-image grid, and may bind a driving audio clip.
FieldTypeRequiredDescription
tagstringYesAsset tag, injected into prompt context. e.g. char / style / bg
typestringYesimage — single image; grid — multi-image composite (2-9 images forming one semantic group)
image_urlsstring[]YesImage URL array. Length 1 when type=image; 2-9 when type=grid
audio_urlstringNoAudio bound to this asset (voiceprint / beat driving), WAV / MP3
"ref_images": [
  {
    "tag": "char",
    "type": "image",
    "image_urls": ["https://cdn.example.com/char.jpg"]
  },
  {
    "tag": "style",
    "type": "grid",
    "image_urls": [
      "https://cdn.example.com/scene-1.jpg",
      "https://cdn.example.com/scene-2.jpg",
      "https://cdn.example.com/scene-3.jpg"
    ],
    "audio_url": "https://cdn.example.com/bgm.mp3"
  }
]

ref_videos — video reference / video extension

Video asset list, up to 1 item. type decides the semantics: style reference vs. video continuation.
FieldTypeRequiredDescription
tagstringYesAsset tag, e.g. src / style_ref
typestringYesreference — style / character / motion reference; extend — video continuation, the new video is appended to the source
video_urlstringYesVideo URL, MP4 / MOV
"ref_videos": [
  {
    "tag": "src",
    "type": "extend",
    "video_url": "https://cdn.example.com/source.mp4"
  }
]
Pricing reminder — when any item in ref_videos is reference or extend, the +video_ref surcharge applies.

mid_frame_images — mid-frame keyframes

Define key frames in the middle of the video, up to 6 items, interpolated by timestamp. Often combined with first/last frames to precisely choreograph shot pacing.
FieldTypeRequiredDescription
tagstringYesKeyframe tag, e.g. kf_2s / peak
image_urlstringYesKeyframe image URL
time_stampnumberYesThe frame’s appearance time (seconds), must lie in (0, duration) and be strictly increasing
"mid_frame_images": [
  {
    "tag": "kf_2s",
    "image_url": "https://cdn.example.com/kf-2.jpg",
    "time_stamp": 2.0
  },
  {
    "tag": "kf_5s",
    "image_url": "https://cdn.example.com/kf-5.jpg",
    "time_stamp": 5.0
  }
]

Prompt Writing Tips

SkyReels V4’s prompt_optimizer is on by default, but writing a clear prompt still significantly improves output quality:
  • Camera language — be explicit about camera motion (push / pull / pan / follow) and shot size (close-up / wide / overhead). Example: slow camera push-in, from wide shot to facial close-up
  • Pacing control — segment with time words. Example: still gaze for the first 2 seconds, then turn and walk away over the next 3. Pairs well with mid_frame_images
  • Style anchoring — describe lighting / color palette / texture. Example: cinematic cool tones, natural backlight, film grain
  • Avoid negatives — V4 does not support negative_prompt. Rewrite “no X” as a positive description
  • Multi-asset guidance — when using ref_images, name the tag in the prompt. Example: keep the char consistent, walking through the style scene

Resource Limits

ItemLimit
ref_imagesUp to 6 items, 2-9 images per type=grid item; each image ≤ 30MB, JPG/PNG/WEBP
ref_videosUp to 1 item, MP4/MOV, ≤ 100MB, 2-30 seconds
mid_frame_imagesUp to 6 items, timestamps must be strictly increasing and within (0, duration)
audio_urls / audio_urlWAV/MP3, ≤ 15MB, 3-30 seconds
first_frame_image / last_frame_imageJPG/PNG/WEBP, ≤ 30MB
OutputMP4, link valid for 24 hours
Concurrency≤ 5 queued tasks per account at the same time