Video Generation
Seedance Series
Doubao Seedance — full lineup of multimodal video generation: 1.0 Lite / Pro / 1.5 / 2.0
POST
Seedance Series
ByteDance Doubao Seedance video generation series, covering the full range from flagship fast (Pro Fast) → flagship high-quality (Pro Quality) → audio-enabled 1.5 → multimodal 2.0.Documentation Index
Fetch the complete documentation index at: https://docs.qingbo.dev/llms.txt
Use this file to discover all available pages before exploring further.
| Generation | Positioning |
|---|---|
| 1.0 Pro Fast | Flagship fast tier, balanced quality and speed; about 3x faster than Pro Quality |
| 1.0 Pro Quality | Flagship high-quality tier, 1080P multi-shot storytelling, suited for final delivery |
| 1.5 Pro | New-generation audio-enabled video, joint audio-video generation, multilingual dialogue + lip sync |
| 2.0 | Multimodal flagship, unified architecture for text + images (≤9) + videos (≤3) + audio (≤3); @image1/@video2/@audio3 reference syntax |
| 2.0 Face | 2.0 with enhanced face/identity reference, suited for digital-human shorts and lip-sync content |
| 2.0 Fast / Fast Face | 2.0 fast tier, no 1080P, trades quality for faster output and lower cost |
Pricing
Billed by resolution × duration, in$/sec.
| Model | 480P | 720P | 1080P |
|---|---|---|---|
doubao-seedance-2.0-fast-face | $0.085 | $0.18275 | — |
doubao-seedance-2.0-fast | $0.06205 | $0.13345 | — |
doubao-seedance-2.0-face | $0.1054 | $0.22695 | $0.53125 |
doubao-seedance-2.0 | $0.077095 | $0.16592 | $0.374 |
doubao-seedance-1.5-pro | $0.021675 | $0.04675 | $0.11475 |
doubao-seedance-1.0-pro-quality | $0.021675 | $0.04675 | $0.1105 |
doubao-seedance-1.0-pro-fast | $0.00935 | $0.02125 | $0.0442 |
video_urls is present and triggers reference_video / reference, the table below replaces the base price above, in $/sec):
| Model | 480P | 720P | 1080P |
|---|---|---|---|
doubao-seedance-2.0-fast-face | $0.051 | $0.10965 | — |
doubao-seedance-2.0-fast | $0.036975 | $0.0799 | — |
doubao-seedance-2.0-face | $0.06375 | $0.13685 | $0.31875 |
doubao-seedance-2.0 | $0.04675 | $0.1003 | $0.22695 |
Video reference is actually cheaper — generations with
video_urls are easier because they have a motion-rhythm reference, so the subprice is significantly lower than the pure-generation base price (e.g. 2.0 720P base $0.16592 vs video reference $0.1003, about 60% of the base). Do not interpret it as a markup.Mode Routing
The whole series shares a field-routing convention — the backend automatically determines the action from the media fields you pass in, so you usually don’t need to setaction explicitly.
| Fields passed | Routed mode | Notes |
|---|---|---|
prompt only | generate (T2V) | Text-to-video; aspect controlled by aspect_ratio |
+ first_frame_image | image2video (I2V) | First-frame driven; follows the first frame’s aspect |
+ first_frame_image + last_frame_image | first_last_frame | Interpolation constrained by first + last frame |
+ image_urls | reference | Multi-image character / style consistency |
+ video_urls (2.0) | reference_video | Video clip reference (2.0 only) |
+ audio_urls (2.0) | reference_audio | Audio-driven (2.0 only) |
Request Examples
GET /v1/tasks/{task_id}. See Task System for details.
Available Models
| Model ID | Series | Supported actions | Resolution | Duration (sec) |
|---|---|---|---|---|
doubao-seedance-2.0-fast-face | 2.0 | Same as 2.0 | 480p / 720p | 4-15 |
doubao-seedance-2.0-fast | 2.0 | Same as 2.0 | 480p / 720p | 4-15 |
doubao-seedance-2.0-face | 2.0 | Same as 2.0 | 480p / 720p / 1080p | 4-15 |
doubao-seedance-2.0 | 2.0 | generate / image2video / first_last_frame / reference / reference_video / reference_audio | 480p / 720p / 1080p | 4-15 |
doubao-seedance-1.5-pro | 1.5 | generate / image2video / first_last_frame | 480p / 720p / 1080p | 4-12 |
doubao-seedance-1.0-pro-quality | 1.0 Pro | generate / image2video / first_last_frame / reference | 480p / 720p / 1080p | 2-12 |
doubao-seedance-1.0-pro-fast | 1.0 Pro | generate / image2video / reference | 480p / 720p / 1080p | 2-12 |
Common Parameters
Model ID; see the “Available Models” table above
Operation type; usually no need to pass explicitly — the backend routes automatically based on media fields. Valid values:
generate— text-to-video (T2V)image2video— image-to-video (I2V); requiresfirst_frame_imagefirst_last_frame— first/last-frame constraint; requiresfirst_frame_image+last_frame_imagereference— reference-image driven; requiresimage_urlsreference_video— video clip reference (2.0 series only); requiresvideo_urlsreference_audio— audio-driven (2.0 series only); requiresaudio_urls
Video description text. Required for T2V; optional as guidance for other modes. The 2.0 series supports
@image1 / @video2 / @audio3 reference syntaxFrame aspect ratio. Valid values:
16:9— landscape widescreen9:16— portrait1:1— square4:3— landscape3:4— portrait21:9— ultrawideadaptive— adaptive (2.0 series only, follows the reference asset)
Output resolution; valid values vary by model:
480p720p1080p(not supported by Lite-i2v / 2.0-fast / 2.0-fast-face)
Video duration in seconds. Range varies by series:
- 1.0 series: 2-12
- 1.5 Pro: 4-12
- 2.0 series: 4-15
Array of reference-image URLs; triggers reference mode. 1.0 / 1.5 support 1-9 images; 2.0 supports up to 9
First-frame image URL; triggers image2video or first_last_frame mode
Last-frame image URL; combined with
first_frame_image triggers first_last_frame modeArray of reference-video URLs (2.0 series only), up to 3 clips; triggers reference_video mode. Billing uses the “video-reference subprice” table above in place of the base price (significantly lower than pure generation)
Array of reference-audio URLs (2.0 series only), up to 3 tracks; triggers reference_audio mode
Webhook callback URL, invoked when the task reaches a terminal state. See Callback Mechanism
Subscribed callback event types; defaults to terminal states only (
completed / failed)Model-Specific Parameters
- 2.0 Fast / 2.0 Fast Face
- 2.0 / 2.0 Face
- 1.5 Pro
- 1.0 Pro Quality
- 1.0 Pro Fast
Parameters identical to 2.0 / 2.0 Face (
generate_audio / tools / image_with_roles / return_last_frame).Differences:- Only 480P / 720P, no 1080P
- Fast tier; slight quality trade-off for faster output and lower cost
- Fast Face variant retains face-consistency capability
- Duration 4-15 seconds
- Bulk production of digital-human short videos
- Social-media lip-sync content
- Rapid asset iteration
Resource Limits
| Item | Limit |
|---|---|
Reference images (image_urls) | Up to 9, ≤ 30MB each, JPG / PNG / WEBP |
First/last frame (first_frame_image / last_frame_image) | ≤ 30MB each, JPG / PNG / WEBP |
Reference videos (video_urls, 2.0 only) | Up to 3 clips, MP4 / MOV, ≤ 100MB per clip |
Reference audio (audio_urls, 2.0 only) | Up to 3 tracks, WAV / MP3, ≤ 15MB per track |
| Output | MP4, link valid for 24 hours |
Related Docs
- Task System Reference — task state machine / polling cadence / async push
- Request and Response Format — common error codes / headers / rate limits
- Authentication — API key application and usage
Seedance Series