Mochi 1 — implementation by Spacelike AI

AsymmDiT video diffusion · 10 B parameters · Genmo AI

Mochi 1 final-sample output generated by Spacelike AI.

Mochi 1 is Genmo AI's open-source video generation model, released October 2024 under the Apache 2.0 license. The 10 B parameter AsymmDiT (Asymmetric Diffusion Transformer) backbone — one of the largest video diffusion transformers released openly at the time — allocates roughly 4× more parameters to visual reasoning than to text, reflecting the signal imbalance between the two modalities in video generation.

The companion Mochi-VAE compresses video 8×8 spatially and 6× temporally. Generates 5.4-second 480p clips at 30 fps from a single T5 XXL-encoded prompt.

Specification

Architecture
AsymmDiT · 4:1 vision-to-text parameter ratio
Parameters
10 B (backbone)
Training objective
Flow matching
Native resolution
848 × 480 · 30 fps · 5.4 s
Text encoder
T5-XXL (frozen)
Sampler shown
Flow-match Euler · 64 steps · cfg 4.5
License
Apache 2.0
Release
October 22, 2024
Checkpoint
genmo/mochi-1-preview

Client: Tenstorrent Inc. — quality improvement.

implementation · Hugging Face · Vendor announcement

Live sample on the Spacelike AI home page.

Sample images on this page are licensed under CC BY 4.0 — reuse with attribution to Spacelike AI and a link back to spacelike.ai.

SpacelikeAI Pushing AI Models To The Limits Of Hardware
01
Denoising Step / 05
sigma 14.6 · latent noise
cfg7.5
step1 / 5
seed0x7A3F
Loading models…