Qwen-Image — implementation by Spacelike AI

Multimodal Diffusion Transformer · 20 B parameters · Alibaba Qwen Team

Qwen-Image final-sample output generated by Spacelike AI.

Qwen-Image is a 20 B parameter Multimodal Diffusion Transformer released by the Alibaba Qwen team in August 2025. Its headline capability is accurate in-image text rendering, including complex logographic scripts (Chinese, Japanese, Korean) alongside English — an area where earlier diffusion models broadly failed.

The model conditions on Qwen2.5-VL for visual-semantic control and a parallel VAE encoder for appearance control, enabling unified text-to-image and image-editing workflows. Open-weight under the Apache 2.0 license.

Specification

Architecture
MMDiT · dual-path (Qwen2.5-VL + VAE) conditioning
Parameters
20 B (backbone)
Training objective
Flow matching
Native resolution
1024 × 1024 · up to 1664 × 928 landscape
Text encoder
Qwen2.5-VL (7B)
Sampler shown
Flow-match Euler · 50 steps · cfg 4.0
License
Apache 2.0
Release
August 2025
Checkpoint
Qwen/Qwen-Image

Client: Tenstorrent Inc. — implementation · performance optimization.

implementation · Hugging Face · Vendor announcement

Live sample on the Spacelike AI home page.

Sample images on this page are licensed under CC BY 4.0 — reuse with attribution to Spacelike AI and a link back to spacelike.ai.

SpacelikeAI Pushing AI Models To The Limits Of Hardware
01
Denoising Step / 05
sigma 14.6 · latent noise
cfg7.5
step1 / 5
seed0x7A3F
Loading models…