Use this file to discover all available pages before exploring further.
Kunlun Wanwei SkyReels V4 video generation series.V4 is the multimodal release of SkyReels. Beyond T2V / I2V, it natively supports Omni multi-asset references (image / video / audio), video extension, first/last frame & mid-frame keyframes, voiceprint sync, and other advanced capabilities. Duration is any integer in 3-15 seconds.Billed by resolution × duration, with two tiers — Fast / Std:
Tier
Positioning
Speed
Quality
Recommended Scenarios
skyreels-v4-fast
Fast tier
Quick
Standard
Creative previews, batch drafts, A/B comparisons
skyreels-v4-std
Standard quality tier
Slower (~1.5-2x)
More stable
Final delivery, client-facing cuts, complex motion shots
Both tiers share the same parameters and capabilities (actions / ref_images / ref_videos / mid_frame_images / voiceprint) — they only differ in sampling depth. Recommended workflow: prototype with fast, render the final cut with std.
Video reference surcharge — when ref_videos contains any item of type reference (video reference) or extend (video extension), pricing follows the “With Video Reference” column; otherwise the standard column.
Switch generation modes via the action field (or the corresponding media fields):
Mode
action
Key Fields
Typical Use
Text-to-video
generate
prompt
Pure text generation
Image-to-video
image2video
first_frame_image
Animate a single frame
First/last frame inpainting
first_last_frame
first_frame_image + last_frame_image
Precisely control start/end frames
Multi-image reference (Omni)
reference
ref_images
Character / style / scene consistency
Video reference
reference_video
ref_videos.type=reference
Replicate camera motion / style
Video extension
extend
ref_videos.type=extend
Continue an existing video
Audio-driven
reference_audio
audio_urls or ref_images[].audio_url
Voiceprint / beat sync, lip-sync
Usually no need to pass action explicitly — the backend auto-routes based on the media fields you supply. Pass it explicitly to lock the mode and avoid ambiguity (e.g. force video reference when both images and videos are present).
Reference image URL array (simplified mode), one of image_urls or ref_images. For complex cases (need to specify tag / type=image|grid / bind audio_url) use ref_images
Reference video URL array (simplified mode), one of video_urls or ref_videos. For complex cases (need to specify tag / type=reference|extend) use ref_videos
Reference audio URL array (simplified mode), triggers voiceprint sync (reference_audio mode), one of audio_urls or the audio_url field inside ref_images. For complex cases (binding audio to a specific asset, linked with image / style group) use ref_images[].audio_url
Omni multi-image reference, up to 6 items. Each item supports image (single image) or grid (multi-image composite), and may include audio_url as the driving audio. See Composite Field Reference below
Mid-frame keyframes, up to 6 items, with timestamps. Combine with first_frame_image / last_frame_image to precisely control motion trajectory. See Composite Field Reference below
Whether to enable intelligent prompt rewriting. When on, the backend expands prompt details to improve generation stability; when off, the original prompt is followed strictly
Positioning: standard quality tier — more stable quality and natural motion, suited for delivery-grade video. Generation takes about 1.5-2x longer than fast.
Omni multi-image reference, up to 6 items. Each item supports image (single image) or grid (multi-image composite), and may include audio_url as the driving audio. See Composite Field Reference below
Mid-frame keyframes, up to 6 items, with timestamps. Combine with first_frame_image / last_frame_image to precisely control motion trajectory. See Composite Field Reference below
Whether to enable intelligent prompt rewriting. When on, the backend expands prompt details to improve generation stability; when off, the original prompt is followed strictly
SkyReels V4’s multimodal capabilities are expressed via three array fields, with semantics more precise than the generic image_urls / video_urls. Recommended for complex scenarios.
Define key frames in the middle of the video, up to 6 items, interpolated by timestamp. Often combined with first/last frames to precisely choreograph shot pacing.
Field
Type
Required
Description
tag
string
Yes
Keyframe tag, e.g. kf_2s / peak
image_url
string
Yes
Keyframe image URL
time_stamp
number
Yes
The frame’s appearance time (seconds), must lie in (0, duration) and be strictly increasing
SkyReels V4’s prompt_optimizer is on by default, but writing a clear prompt still significantly improves output quality:
Camera language — be explicit about camera motion (push / pull / pan / follow) and shot size (close-up / wide / overhead). Example: slow camera push-in, from wide shot to facial close-up
Pacing control — segment with time words. Example: still gaze for the first 2 seconds, then turn and walk away over the next 3. Pairs well with mid_frame_images
Style anchoring — describe lighting / color palette / texture. Example: cinematic cool tones, natural backlight, film grain
Avoid negatives — V4 does not support negative_prompt. Rewrite “no X” as a positive description
Multi-asset guidance — when using ref_images, name the tag in the prompt. Example: keep the char consistent, walking through the style scene