HiDream-I1-Full - AI Image Models Tool
Overview
HiDream-I1-Full is an open-source text-to-image foundation model with roughly 17 billion parameters designed for high-quality, fast image synthesis across styles (photorealistic, concept art, cartoon, painting). The model implements a sparse Diffusion Transformer (DiT) design and is released under an MIT license; Hugging Face and the project repository list the full model, distilled variants (Dev, Fast), a Gradio demo, and ready-made inference scripts for on-prem or cloud deployment. ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)) According to the project's technical report and model card, HiDream-I1 emphasizes prompt-following and compositional accuracy: it reports top-tier scores on HPSv2.1 (human preference), GenEval, and DPG-Bench benchmarks while offering fast inference (three released variants trade off speed and fidelity). The authors also provide a HiDream-E1 instruction-based image-editing model built on the same backbone. Community responses praise its prompt adherence and artist-style recognition, while some users note that public demos or heavily quantized builds can show degraded image fidelity compared with locally-run full/fp16 builds. ([arxiv.org](https://arxiv.org/abs/2505.22705?utm_source=openai))
Model Statistics
- Downloads: 13,002
- Likes: 983
- Pipeline: text-to-image
License: mit
Model Details
Architecture and components: HiDream-I1-Full uses a sparse diffusion transformer (DiT) backbone with a dual‑stream decoupled design and a dynamic Mixture-of-Experts (MoE) stage that enables efficient cross-modal interaction between text and image tokens. The project provides three variants—Full (highest quality), Dev (distilled medium-latency), and Fast (distilled low-latency)—to support different inference budgets. The paper describes latent-space generation using a pre-trained VAE and flow/latent flow matching techniques to reduce computation while preserving fidelity. ([arxiv.org](https://arxiv.org/abs/2505.22705?utm_source=openai)) Model inputs and auxiliary components: The released model uses a FLUX.1 [schnell] VAE for latent encoding and integrates hybrid text encoders (examples include google/t5-v1_1-xxl and meta-llama/Meta-Llama-3.1-8B-Instruct in the distributed inference pipelines). Implementation notes recommend FlashAttention and CUDA 12.4 for best performance; the project also publishes a Diffusers-compatible HiDreamImagePipeline and example scripts for inference and Gradio demos. Parameter count (≈17B), license (MIT), and download/usage metadata are recorded on the model card. ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)) Deployment and usage: The repository includes an inference.py and gradio_demo.py, plus a Diffusers integration example showing how to load tokenizers and Llama-based text encoders, move the pipeline to CUDA, and run prompt-based image generation at configurable sizes, guidance scales, and fixed random seeds. The authors note that automatic download of some third-party encoder weights (e.g., Llama 3.1 8B) requires agreeing to those models' licenses. ([github.com](https://github.com/HiDream-ai/HiDream-I1))
Key Features
- Sparse Diffusion Transformer (DiT) with dynamic MoE for efficiency and compositionality.
- Three variants: Full (high quality), Dev (balanced), Fast (low-latency).
- High human-preference alignment (top HPSv2.1 scores reported).
- Strong prompt-following and compositional reasoning (GenEval and DPG-Bench results).
- Diffusers pipeline and Gradio demo with ready-made inference scripts.
- Open-source MIT license and commercial-friendly usage statements.
- Integrates FLUX.1 VAE and hybrid text encoders (T5 / Llama components).
Example Usage
Example (python):
import torch
from transformers import PreTrainedTokenizerFast, LlamaForCausalLM
from diffusers import HiDreamImagePipeline
# Load text encoder (example uses Llama 3.1 instruct model)
tokenizer = PreTrainedTokenizerFast.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
text_encoder = LlamaForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3.1-8B-Instruct",
output_hidden_states=True,
output_attentions=True,
torch_dtype=torch.bfloat16,
)
# Load the HiDream pipeline (Full model)
pipe = HiDreamImagePipeline.from_pretrained(
"HiDream-ai/HiDream-I1-Full",
tokenizer_4=tokenizer,
text_encoder_4=text_encoder,
torch_dtype=torch.bfloat16,
)
pipe = pipe.to("cuda")
# Generate an image
image = pipe(
"A futuristic cityscape at sunset, cinematic lighting, photorealistic",
height=1024,
width=1024,
guidance_scale=5.0,
num_inference_steps=50,
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("hidream_output.png")
# Example adapted from the project's README and Diffusers integration examples. ([github.com](https://github.com/HiDream-ai/HiDream-I1)) Benchmarks
DPG-Bench (Overall): 85.89 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))
GenEval (Overall): 0.83 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))
HPSv2.1 (Averaged human preference score): 33.82 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))
Hugging Face: Downloads (last month): 13,002 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))
Hugging Face: Likes: 983 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))
Key Information
- Category: Image Models
- Type: AI Image Models Tool