Skip to main content
POST
/
v1
/
tasks
Seedance Series
curl --request POST \
  --url https://www.qingbo.dev/v1/tasks \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "action": "<string>",
  "prompt": "<string>",
  "aspect_ratio": "<string>",
  "resolution": "<string>",
  "duration": 123,
  "image_urls": [
    "<string>"
  ],
  "first_frame_image": "<string>",
  "last_frame_image": "<string>",
  "video_urls": [
    "<string>"
  ],
  "audio_urls": [
    "<string>"
  ],
  "callback_url": "<string>",
  "callback_events": [
    "<string>"
  ]
}
'
{
  "task_id": "task-wave1775285160b950328499",
  "model": "doubao-seedance-2.0",
  "action": "reference",
  "status": "queued",
  "created_at": 1775285160040,
  "progress": 0
}

Documentation Index

Fetch the complete documentation index at: https://docs.qingbo.dev/llms.txt

Use this file to discover all available pages before exploring further.

ByteDance Doubao Seedance video generation series, covering the full range from flagship fast (Pro Fast) → flagship high-quality (Pro Quality) → audio-enabled 1.5 → multimodal 2.0.
GenerationPositioning
1.0 Pro FastFlagship fast tier, balanced quality and speed; about 3x faster than Pro Quality
1.0 Pro QualityFlagship high-quality tier, 1080P multi-shot storytelling, suited for final delivery
1.5 ProNew-generation audio-enabled video, joint audio-video generation, multilingual dialogue + lip sync
2.0Multimodal flagship, unified architecture for text + images (≤9) + videos (≤3) + audio (≤3); @image1/@video2/@audio3 reference syntax
2.0 Face2.0 with enhanced face/identity reference, suited for digital-human shorts and lip-sync content
2.0 Fast / Fast Face2.0 fast tier, no 1080P, trades quality for faster output and lower cost

Pricing

Billed by resolution × duration, in $/sec.
Model480P720P1080P
doubao-seedance-2.0-fast-face$0.085$0.18275
doubao-seedance-2.0-fast$0.06205$0.13345
doubao-seedance-2.0-face$0.1054$0.22695$0.53125
doubao-seedance-2.0$0.077095$0.16592$0.374
doubao-seedance-1.5-pro$0.021675$0.04675$0.11475
doubao-seedance-1.0-pro-quality$0.021675$0.04675$0.1105
doubao-seedance-1.0-pro-fast$0.00935$0.02125$0.0442
2.0 series video-reference subprice (when video_urls is present and triggers reference_video / reference, the table below replaces the base price above, in $/sec):
Model480P720P1080P
doubao-seedance-2.0-fast-face$0.051$0.10965
doubao-seedance-2.0-fast$0.036975$0.0799
doubao-seedance-2.0-face$0.06375$0.13685$0.31875
doubao-seedance-2.0$0.04675$0.1003$0.22695
Video reference is actually cheaper — generations with video_urls are easier because they have a motion-rhythm reference, so the subprice is significantly lower than the pure-generation base price (e.g. 2.0 720P base $0.16592 vs video reference $0.1003, about 60% of the base). Do not interpret it as a markup.

Mode Routing

The whole series shares a field-routing convention — the backend automatically determines the action from the media fields you pass in, so you usually don’t need to set action explicitly.
Fields passedRouted modeNotes
prompt onlygenerate (T2V)Text-to-video; aspect controlled by aspect_ratio
+ first_frame_imageimage2video (I2V)First-frame driven; follows the first frame’s aspect
+ first_frame_image + last_frame_imagefirst_last_frameInterpolation constrained by first + last frame
+ image_urlsreferenceMulti-image character / style consistency
+ video_urls (2.0)reference_videoVideo clip reference (2.0 only)
+ audio_urls (2.0)reference_audioAudio-driven (2.0 only)
2.0 multi-asset reference syntax — inside prompt you can use placeholders like @image1 / @video2 / @audio3 to reference image_urls[0] / video_urls[1] / audio_urls[2]; indices are 1-based.

Request Examples

curl -X POST https://www.qingbo.dev/v1/tasks \
  -H "Authorization: Bearer $WAVE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-1.0-pro-fast",
    "prompt": "A Shiba Inu in a spacesuit walking on the moon, cinematic lighting",
    "duration": 5,
    "resolution": "720p",
    "aspect_ratio": "16:9"
  }'
{
  "task_id": "task-wave1775285160b950328499",
  "model": "doubao-seedance-2.0",
  "action": "reference",
  "status": "queued",
  "created_at": 1775285160040,
  "progress": 0
}
After submission, poll status with GET /v1/tasks/{task_id}. See Task System for details.

Available Models

Model IDSeriesSupported actionsResolutionDuration (sec)
doubao-seedance-2.0-fast-face2.0Same as 2.0480p / 720p4-15
doubao-seedance-2.0-fast2.0Same as 2.0480p / 720p4-15
doubao-seedance-2.0-face2.0Same as 2.0480p / 720p / 1080p4-15
doubao-seedance-2.02.0generate / image2video / first_last_frame / reference / reference_video / reference_audio480p / 720p / 1080p4-15
doubao-seedance-1.5-pro1.5generate / image2video / first_last_frame480p / 720p / 1080p4-12
doubao-seedance-1.0-pro-quality1.0 Progenerate / image2video / first_last_frame / reference480p / 720p / 1080p2-12
doubao-seedance-1.0-pro-fast1.0 Progenerate / image2video / reference480p / 720p / 1080p2-12

Common Parameters

model
string
required
Model ID; see the “Available Models” table above
action
string
default:"generate"
Operation type; usually no need to pass explicitly — the backend routes automatically based on media fields. Valid values:
  • generate — text-to-video (T2V)
  • image2video — image-to-video (I2V); requires first_frame_image
  • first_last_frame — first/last-frame constraint; requires first_frame_image + last_frame_image
  • reference — reference-image driven; requires image_urls
  • reference_video — video clip reference (2.0 series only); requires video_urls
  • reference_audio — audio-driven (2.0 series only); requires audio_urls
prompt
string
Video description text. Required for T2V; optional as guidance for other modes. The 2.0 series supports @image1 / @video2 / @audio3 reference syntax
aspect_ratio
string
default:"16:9"
Frame aspect ratio. Valid values:
  • 16:9 — landscape widescreen
  • 9:16 — portrait
  • 1:1 — square
  • 4:3 — landscape
  • 3:4 — portrait
  • 21:9 — ultrawide
  • adaptive — adaptive (2.0 series only, follows the reference asset)
resolution
string
default:"720p"
Output resolution; valid values vary by model:
  • 480p
  • 720p
  • 1080p (not supported by Lite-i2v / 2.0-fast / 2.0-fast-face)
duration
integer
default:"5"
Video duration in seconds. Range varies by series:
  • 1.0 series: 2-12
  • 1.5 Pro: 4-12
  • 2.0 series: 4-15
image_urls
string[]
Array of reference-image URLs; triggers reference mode. 1.0 / 1.5 support 1-9 images; 2.0 supports up to 9
first_frame_image
string
First-frame image URL; triggers image2video or first_last_frame mode
last_frame_image
string
Last-frame image URL; combined with first_frame_image triggers first_last_frame mode
video_urls
string[]
Array of reference-video URLs (2.0 series only), up to 3 clips; triggers reference_video mode. Billing uses the “video-reference subprice” table above in place of the base price (significantly lower than pure generation)
audio_urls
string[]
Array of reference-audio URLs (2.0 series only), up to 3 tracks; triggers reference_audio mode
callback_url
string
Webhook callback URL, invoked when the task reaches a terminal state. See Callback Mechanism
callback_events
string[]
Subscribed callback event types; defaults to terminal states only (completed / failed)

Model-Specific Parameters

Parameters identical to 2.0 / 2.0 Face (generate_audio / tools / image_with_roles / return_last_frame).Differences:
  • Only 480P / 720P, no 1080P
  • Fast tier; slight quality trade-off for faster output and lower cost
  • Fast Face variant retains face-consistency capability
  • Duration 4-15 seconds
Typical use cases:
  • Bulk production of digital-human short videos
  • Social-media lip-sync content
  • Rapid asset iteration

Resource Limits

ItemLimit
Reference images (image_urls)Up to 9, ≤ 30MB each, JPG / PNG / WEBP
First/last frame (first_frame_image / last_frame_image)≤ 30MB each, JPG / PNG / WEBP
Reference videos (video_urls, 2.0 only)Up to 3 clips, MP4 / MOV, ≤ 100MB per clip
Reference audio (audio_urls, 2.0 only)Up to 3 tracks, WAV / MP3, ≤ 15MB per track
OutputMP4, link valid for 24 hours