Remember when AI video creation was limited to text prompts and first/last frame controls? With Seedance 2.0, those days are over. This is a model that truly understands your creative intent — and it does so by accepting four types of input: images, videos, audio, and text.
You can lock down a visual style with a reference image, dictate camera movement and character action with a reference video, set the rhythm and atmosphere with a few seconds of audio — all while guiding the generation with your text prompt. This is what it feels like to actually direct AI video.
In this guide, you'll learn everything you need to master Seedance 2.0 on HeyMarmot.
What You'll Learn
- Understand Seedance 2.0's input specifications and limits
- Use multimodal inputs (image + video + audio + text) together
- Apply reference capabilities for consistency, motion control, and creative effects
- Master advanced techniques like video extension, video editing, and music sync
Input Specifications
Before you start creating, here's what Seedance 2.0 accepts:
Image Input
| Parameter | Details |
|---|---|
| Formats | JPEG, PNG, WebP, BMP, TIFF, GIF |
| Max count | Up to 9 images |
| Max size | 30 MB per image |
Video Input
| Parameter | Details |
|---|---|
| Formats | MP4, MOV |
| Max count | Up to 3 videos |
| Total duration | 2–15 seconds |
| Max size | 50 MB per video |
| Resolution range | 480p (640x640) to 720p (834x1112) |
Text Input
Use text prompts to describe the scene, action, camera movement, style, and any specific details you want in the generated video. Seedance 2.0 has strong prompt-following capability and understands complex, multi-part descriptions.
Audio Input
Upload short audio clips to influence the rhythm, mood, and sound design of the generated video. The model can match visual pacing to audio beats.
How Multimodal Input Works
The real power of Seedance 2.0 is in combining these inputs. Here's how each modality contributes:
Reference Images — Lock Down the Look
Upload reference images to control visual style, character appearance, scene composition, and object details. The model precisely reproduces the look and feel of your reference.
Use cases:
- Set a consistent character design across multiple shots
- Define the visual style (anime, photorealistic, oil painting, etc.)
- Provide product images for commercial video generation
- Establish scene composition and framing
Example prompt with image references:
Use @image1 as the first frame. First-person perspective,
reference @video1 for camera movement, upper scene references
@image2, left scene references @image3, right scene references
@image4.
Reference Videos — Control Motion and Camera
Upload reference videos to dictate camera language, action choreography, creative transitions, and complex motion patterns. The model replicates the movement dynamics from your reference.
Use cases:
- Replicate specific camera movements (dolly, tracking, crane)
- Match dance choreography or fight sequences
- Copy creative transitions and visual effects
- Maintain consistent pacing across shots
Example prompt with video reference:
Replace the woman in @video1 with a Peking opera performer
on an elaborate stage. Reference @video1's camera movement
and transition effects, match the character's actions with
the lens movement, ultimate stage aesthetics, enhanced
visual impact.
Combining Multiple References
You can freely combine image and video references in a single generation. Use the @image1, @image2, @video1 notation to reference specific uploads in your prompt.
Example — Product showcase:
Create a commercial-grade camera showcase of the bag in
@image2. The bag's side view references @image1, surface
material references @image3. Show all details of the bag.
Background music should be grand and atmospheric.
Example — Multi-reference scene:
Use @image1 as the starting frame. First-person perspective,
reference @video1 for camera movement effects. Upper scene
references @image2, left scene references @image3, right
scene references @image4.
Core Capabilities
1. Enhanced Base Quality
Seedance 2.0 delivers a significant leap in fundamental video quality:
- Stability — Reduced flickering, jitter, and temporal inconsistency
- Smoothness — More natural motion with better frame-to-frame coherence
- Realism — Improved textures, lighting, and physics simulation
2. Character and Scene Consistency
One of the hardest problems in AI video — maintaining consistent characters and scenes across shots — is dramatically improved. When you provide reference images, Seedance 2.0 preserves:
- Facial features and expressions
- Clothing and accessories
- Scene elements and background details
- Color palette and visual style
3. Camera Movement Replication
Provide a reference video and Seedance 2.0 can precisely replicate complex camera work:
- Tracking shots — Follow subjects with smooth horizontal movement
- Push-pull shots — Dolly in and out with natural acceleration
- Rotation shots — Focused spins and orbital camera movements
- Aerial movements — Crane-like ascending and descending shots
- Horror-style movements — Unsettling, tension-building camera work
- Car chase cinematography — Dynamic vehicle-following shots
4. Creative Template and Effects Replication
Upload a video with creative effects, and the model can reproduce the visual treatment while applying it to entirely new content:
- Split-screen transitions
- Speed ramps and slow motion
- Visual effects and particle systems
- Branded intro/outro sequences
- After Effects-style motion graphics
5. Creative Storytelling and Scene Completion
Seedance 2.0 doesn't just follow instructions — it has genuine creative intelligence. Given partial context, it can:
- Complete scenes with logical narrative progression
- Add expressive character reactions and emotions
- Generate contextually appropriate background action
- Fill in creative details that weren't explicitly prompted
Example prompt:
A painting character looks around nervously, eyes darting
left and right, then reaches out of the picture frame,
quickly grabs a cola, and takes a sip.
6. Video Extension
Extend existing videos with seamless continuity. The model maintains visual consistency while generating new content that naturally follows the original clip. This enables:
- Extending a 5-second clip to 15 seconds
- Adding new scenes that flow naturally from the original
- Building longer narratives from short clips
7. Improved Audio Generation
Seedance 2.0 generates more accurate and realistic audio:
- Voice accuracy — Better tonal matching for character dialogue
- Sound realism — More convincing environmental sounds
- Music sync — Generated visuals can align to audio beats
8. Long-Take Coherence
The model excels at generating smooth, uninterrupted single-take sequences — maintaining spatial and temporal consistency throughout extended shots without cuts.
9. Video Editing
Edit existing videos with natural-looking modifications:
- Character replacement — Swap characters while maintaining motion
- Object addition/removal — Add or remove elements seamlessly
- Style transfer — Change the visual treatment of existing footage
10. Music-Synchronized Generation
Provide an audio track and Seedance 2.0 will generate visuals that match the rhythm and beats:
- Cut transitions aligned to musical beats
- Camera movements synced to tempo changes
- Visual intensity matching audio dynamics
11. Emotion and Expression
Characters generated by Seedance 2.0 display more nuanced and believable emotional performances — from subtle facial micro-expressions to dramatic physical reactions.
Prompting Tips for Seedance 2.0
The Reference Formula
When using multimodal inputs, structure your prompt like this:
[Reference assignments] + [Scene description] + [Camera/Motion] + [Style/Mood]
Example:
@image1 as the first frame, reference @video1 for camera
movement. A woman elegantly hangs laundry, finishes one
piece, reaches into the basket for another, and shakes
it out with force. Fixed camera angle.
Tips for Better Results
- Be specific about camera movement — Instead of "camera moves," say "camera slowly tracks right following the subject"
- Reference specific elements — Use
@image1,@video1notation to point to exact references - Describe timing — Include pacing details like "0-2 seconds: ..., 3-6 seconds: ..."
- Layer your references — Use images for visual style and videos for motion/camera
- Include audio direction — Describe the mood and style of background sound
Timestamp Prompting
For precise control over multi-segment videos, use timestamp notation:
0-2 seconds: Quick four-frame flash cuts, four different
bow styles in sequence, close-up on satin texture and
brand text.
3-6 seconds: Close-up of magnetic clasp clicking shut,
then gently pulling apart, showcasing smooth texture and
convenience.
7-12 seconds: Quick cuts between wearing scenarios —
burgundy on coat collar, pink in ponytail, purple on bag
strap, leopard print on suit collar.
13-15 seconds: All four bows displayed side by side with
brand name.
Getting Started on HeyMarmot
Seedance 2.0 is available now on HeyMarmot. Here's how to get started:
- Go to HeyMarmot and select Seedance 2.0 from the model dropdown
- Upload references — Add images and/or videos as reference material
- Write your prompt — Use the
@image1,@video1notation to reference your uploads - Generate — Hit create and let the model work its magic
Starter prompts to try:
Fixed camera, a girl elegantly hanging laundry, finishes
one piece and reaches into the basket for another, shakes
it out with force.
Camera slowly pulls back (revealing the full street scene)
and follows the woman as she walks. Wind blows her skirt
hem. She walks along a 19th century European street.
Camera follows the man in black as he runs frantically.
A group of people chase behind him. Side-tracking shot.
The character panics and knocks over roadside obstacles.
Seedance 2.0 vs Seedance 2.0 Fast
Two variants are available:
| Seedance 2.0 | Seedance 2.0 Fast | |
|---|---|---|
| Quality | Maximum quality | Slightly reduced |
| Speed | Standard | Faster generation |
| Best for | Final output, high-quality work | Iteration, testing, quick drafts |
When the servers are busy, Seedance 2.0 Fast is a great alternative for faster turnaround while still delivering impressive results.
Conclusion
Seedance 2.0 marks a paradigm shift in AI video generation — from simple text-to-video to true multimodal creative direction. By combining image, video, audio, and text inputs, you have unprecedented control over every aspect of your generated video.
The model's ability to replicate camera movements, maintain character consistency, extend videos seamlessly, and sync to music makes it a serious tool for content creators, marketers, and filmmakers alike.
Start experimenting on HeyMarmot — the best way to learn is by doing. Upload some references, write a detailed prompt, and see what Seedance 2.0 can create for you.
