HiDream-I1-Full - AI Image Models Tool

Overview

HiDream-I1-Full is an open-source text-to-image foundation model with roughly 17 billion parameters designed for high-quality, fast image synthesis across styles (photorealistic, concept art, cartoon, painting). The model implements a sparse Diffusion Transformer (DiT) design and is released under an MIT license; Hugging Face and the project repository list the full model, distilled variants (Dev, Fast), a Gradio demo, and ready-made inference scripts for on-prem or cloud deployment. ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)) According to the project's technical report and model card, HiDream-I1 emphasizes prompt-following and compositional accuracy: it reports top-tier scores on HPSv2.1 (human preference), GenEval, and DPG-Bench benchmarks while offering fast inference (three released variants trade off speed and fidelity). The authors also provide a HiDream-E1 instruction-based image-editing model built on the same backbone. Community responses praise its prompt adherence and artist-style recognition, while some users note that public demos or heavily quantized builds can show degraded image fidelity compared with locally-run full/fp16 builds. ([arxiv.org](https://arxiv.org/abs/2505.22705?utm_source=openai))

Model Statistics

  • Downloads: 13,002
  • Likes: 983
  • Pipeline: text-to-image

License: mit

Model Details

Architecture and components: HiDream-I1-Full uses a sparse diffusion transformer (DiT) backbone with a dual‑stream decoupled design and a dynamic Mixture-of-Experts (MoE) stage that enables efficient cross-modal interaction between text and image tokens. The project provides three variants—Full (highest quality), Dev (distilled medium-latency), and Fast (distilled low-latency)—to support different inference budgets. The paper describes latent-space generation using a pre-trained VAE and flow/latent flow matching techniques to reduce computation while preserving fidelity. ([arxiv.org](https://arxiv.org/abs/2505.22705?utm_source=openai)) Model inputs and auxiliary components: The released model uses a FLUX.1 [schnell] VAE for latent encoding and integrates hybrid text encoders (examples include google/t5-v1_1-xxl and meta-llama/Meta-Llama-3.1-8B-Instruct in the distributed inference pipelines). Implementation notes recommend FlashAttention and CUDA 12.4 for best performance; the project also publishes a Diffusers-compatible HiDreamImagePipeline and example scripts for inference and Gradio demos. Parameter count (≈17B), license (MIT), and download/usage metadata are recorded on the model card. ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)) Deployment and usage: The repository includes an inference.py and gradio_demo.py, plus a Diffusers integration example showing how to load tokenizers and Llama-based text encoders, move the pipeline to CUDA, and run prompt-based image generation at configurable sizes, guidance scales, and fixed random seeds. The authors note that automatic download of some third-party encoder weights (e.g., Llama 3.1 8B) requires agreeing to those models' licenses. ([github.com](https://github.com/HiDream-ai/HiDream-I1))

Key Features

  • Sparse Diffusion Transformer (DiT) with dynamic MoE for efficiency and compositionality.
  • Three variants: Full (high quality), Dev (balanced), Fast (low-latency).
  • High human-preference alignment (top HPSv2.1 scores reported).
  • Strong prompt-following and compositional reasoning (GenEval and DPG-Bench results).
  • Diffusers pipeline and Gradio demo with ready-made inference scripts.
  • Open-source MIT license and commercial-friendly usage statements.
  • Integrates FLUX.1 VAE and hybrid text encoders (T5 / Llama components).

Example Usage

Example (python):

import torch
from transformers import PreTrainedTokenizerFast, LlamaForCausalLM
from diffusers import HiDreamImagePipeline

# Load text encoder (example uses Llama 3.1 instruct model)
tokenizer = PreTrainedTokenizerFast.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
text_encoder = LlamaForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    output_hidden_states=True,
    output_attentions=True,
    torch_dtype=torch.bfloat16,
)

# Load the HiDream pipeline (Full model)
pipe = HiDreamImagePipeline.from_pretrained(
    "HiDream-ai/HiDream-I1-Full",
    tokenizer_4=tokenizer,
    text_encoder_4=text_encoder,
    torch_dtype=torch.bfloat16,
)
pipe = pipe.to("cuda")

# Generate an image
image = pipe(
    "A futuristic cityscape at sunset, cinematic lighting, photorealistic",
    height=1024,
    width=1024,
    guidance_scale=5.0,
    num_inference_steps=50,
    generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("hidream_output.png")

# Example adapted from the project's README and Diffusers integration examples. ([github.com](https://github.com/HiDream-ai/HiDream-I1))

Benchmarks

DPG-Bench (Overall): 85.89 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))

GenEval (Overall): 0.83 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))

HPSv2.1 (Averaged human preference score): 33.82 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))

Hugging Face: Downloads (last month): 13,002 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))

Hugging Face: Likes: 983 (Source: ([huggingface.co](https://huggingface.co/HiDream-ai/HiDream-I1-Full)))

Last Refreshed: 2026-01-09

Key Information

  • Category: Image Models
  • Type: AI Image Models Tool