Qwen-Image — implementation by Spacelike AI
Multimodal Diffusion Transformer · 20 B parameters · Alibaba Qwen Team
Qwen-Image is a 20 B parameter Multimodal Diffusion Transformer released by the Alibaba Qwen team in August 2025. Its headline capability is accurate in-image text rendering, including complex logographic scripts (Chinese, Japanese, Korean) alongside English — an area where earlier diffusion models broadly failed.
The model conditions on Qwen2.5-VL for visual-semantic control and a parallel VAE encoder for appearance control, enabling unified text-to-image and image-editing workflows. Open-weight under the Apache 2.0 license.
Specification
- Architecture
- MMDiT · dual-path (Qwen2.5-VL + VAE) conditioning
- Parameters
- 20 B (backbone)
- Training objective
- Flow matching
- Native resolution
- 1024 × 1024 · up to 1664 × 928 landscape
- Text encoder
- Qwen2.5-VL (7B)
- Sampler shown
- Flow-match Euler · 50 steps · cfg 4.0
- License
- Apache 2.0
- Release
- August 2025
- Checkpoint
- Qwen/Qwen-Image
Client: Tenstorrent Inc. — implementation · performance optimization.
implementation · Hugging Face · Vendor announcement
Live sample on the Spacelike AI home page.
Sample images on this page are licensed under CC BY 4.0 — reuse with attribution to Spacelike AI and a link back to spacelike.ai.