Mochi 1 — implementation by Spacelike AI
AsymmDiT video diffusion · 10 B parameters · Genmo AI
Mochi 1 is Genmo AI's open-source video generation model, released October 2024 under the Apache 2.0 license. The 10 B parameter AsymmDiT (Asymmetric Diffusion Transformer) backbone — one of the largest video diffusion transformers released openly at the time — allocates roughly 4× more parameters to visual reasoning than to text, reflecting the signal imbalance between the two modalities in video generation.
The companion Mochi-VAE compresses video 8×8 spatially and 6× temporally. Generates 5.4-second 480p clips at 30 fps from a single T5 XXL-encoded prompt.
Specification
- Architecture
- AsymmDiT · 4:1 vision-to-text parameter ratio
- Parameters
- 10 B (backbone)
- Training objective
- Flow matching
- Native resolution
- 848 × 480 · 30 fps · 5.4 s
- Text encoder
- T5-XXL (frozen)
- Sampler shown
- Flow-match Euler · 64 steps · cfg 4.5
- License
- Apache 2.0
- Release
- October 22, 2024
- Checkpoint
- genmo/mochi-1-preview
Client: Tenstorrent Inc. — quality improvement.
implementation · Hugging Face · Vendor announcement
Live sample on the Spacelike AI home page.
Sample images on this page are licensed under CC BY 4.0 — reuse with attribution to Spacelike AI and a link back to spacelike.ai.