On April 8, 2026, an anonymous model called HappyHorse-1.0 appeared at the top of the Artificial Analysis Text-to-Video and Image-to-Video leaderboards — dethroning ByteDance's Seedance 2.0 seemingly overnight. Within 48 hours, the AI community confirmed the team behind it: the Future Life Lab at Taotian Group (Alibaba's e-commerce arm), led by Zhang Di.
Here's everything you need to know about HappyHorse 1.0 — the architecture, the benchmarks, and what it means for creators.
What Is HappyHorse 1.0?
HappyHorse 1.0 is an open-source AI video generation model with approximately 15 billion parameters. It supports three core modes:
- Text-to-Video — describe a scene and get a finished video clip
- Image-to-Video — animate a still image into motion
- Audio-Video Generation — produce video with synchronized dialogue, ambient sound, and lip-sync in a single forward pass
The model outputs native 1080p HD video at 5–10 seconds per clip, with support for multiple aspect ratios (16:9, 9:16, 4:3, 21:9, 1:1).
Architecture: A Unified Transformer
HappyHorse 1.0 is built on a single-stream 40-layer Transformer with no cross-attention — an approach consistent with the Transfusion architecture. The design uses a "sandwich" layout:
- First 4 layers: modality-specific input projections
- Middle 32 layers: shared parameters across all modalities (video, audio, text)
- Last 4 layers: modality-specific output projections
This unified design is what enables HappyHorse to generate video and audio together in a single pass — dialogue aligns naturally to mouth shapes at the phoneme level, footsteps land on the right frames, and ambient noise responds to camera cuts, all without post-production.
Performance Benchmarks
HappyHorse 1.0 dominates the Artificial Analysis leaderboard:
| Category | HappyHorse 1.0 Elo | Seedance 2.0 Elo | Gap |
|---|---|---|---|
| Text-to-Video (no audio) | 1333–1357 | 1273 | +60 |
| Image-to-Video | 1391–1406 | — | New record |
Speed
Generation uses only 8 denoising steps with DMD-2 distillation training — no classifier-free guidance needed. Combined with MagiCompiler (a full-graph compilation system), reported inference speeds are:
- ~2 seconds for a 5-second clip at 256p
- ~38 seconds for 1080p on an H100
That makes HappyHorse one of the fastest AI video models available today, averaging roughly 10 seconds per generation at standard settings.
Built-In Audio-Video Sync
This is arguably HappyHorse's biggest differentiator. Most AI video models generate silent video — you need a separate model or manual work for audio. HappyHorse 1.0 denoises video tokens and audio tokens together in the same sequence within a single Transformer.
The result:
- Lip-sync aligned at the phoneme level
- Ambient sound that responds to scene changes
- Multi-language voiceover support: English, Mandarin, Cantonese, Japanese, Korean, German, and French
- Industry-leading low Word Error Rate
HappyHorse 1.0 vs Seedance 2.0
| Feature | HappyHorse 1.0 | Seedance 2.0 |
|---|---|---|
| Parameters | ~15B | ~13B (estimated) |
| Max Resolution | 1080p native | 1080p |
| Audio Generation | Built-in (single pass) | Separate audio model |
| Lip Sync | 7 languages, phoneme-level | Available but separate |
| Open Source | Yes (full weights + code) | No |
| Speed (1080p, H100) | ~38s | ~45s |
| Leaderboard Rank | #1 | #2 |
Both are excellent models. Seedance 2.0 excels at multimodal input (combining reference images, videos, and audio as creative inputs), while HappyHorse 1.0 leads in raw output quality and integrated audio generation.
Open Source: Weights, Code, and Commercial License
HappyHorse 1.0 is fully open source with a commercial-friendly license. The release includes:
- Base model weights (~15B parameters)
- Distilled model (faster inference)
- Super-resolution module (upscale to 1080p)
- Full inference code
This means you can self-host, fine-tune, or integrate HappyHorse into your own pipeline — a major advantage for teams that need customization or want to avoid API costs at scale.
How to Use HappyHorse 1.0 on HeyMarmot
HappyHorse 1.0 is coming soon to HeyMarmot AI. Once available, you'll be able to:
- Open a conversation in the HeyMarmot workspace
- Select HappyHorse 1.0 from the model dropdown
- Type your prompt or upload a reference image
- Get HD video with audio in seconds — no separate audio step needed
No credit card required to start. HeyMarmot gives you 500 free credits on signup, so you can try HappyHorse 1.0 the moment it launches.
Who Should Use HappyHorse 1.0?
- Short-form content creators — fast generation + built-in audio means fewer tools in your workflow
- Multilingual teams — native lip-sync in 7 languages is hard to beat
- Developers and researchers — full open-source access for fine-tuning and self-hosting
- Anyone who wants the best quality — it's currently #1 on the global leaderboard
The Bottom Line
HappyHorse 1.0 is a genuine leap forward in AI video generation. The unified audio-video architecture eliminates the most painful step in AI video workflows (adding sound), and the open-source release means the community can push it further. It's fast, it's high quality, and it's free to use.
Stay tuned on HeyMarmot — we'll announce HappyHorse 1.0 availability as soon as it's ready. In the meantime, you can start creating with Seedance 2.0, Veo 3.1, and other top models today.
Start creating on HeyMarmot — 500 free credits, no credit card
