Stable Diffusion XL Base 1.0 - AI Image Models Tool
Overview
Stable Diffusion XL (SDXL) Base 1.0 is Stability AI’s flagship open-weight text-to-image latent diffusion model designed for higher-fidelity, larger-format generation than earlier Stable Diffusion releases. The release uses an "ensemble of experts" (base + optional refiner) workflow: the base model produces latents at the target resolution and an optional refiner model performs the final denoising/refinement pass for improved detail and texture. The model was trained with techniques to support multiple aspect ratios and 1024×1024 native image resolution, and it accepts both standard text-to-image prompts and img2img/SDEdit-style conditioning for image-guided edits. ([huggingface.co](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0?utm_source=openai)) SDXL Base 1.0 ships under Stability AI’s CreativeML Open RAIL++-M license and the weights are published on the Hugging Face Hub for research and integration. The architecture uses a larger U-Net backbone and two fixed text encoders (OpenCLIP-ViT/G and CLIP-ViT/L) to improve prompt understanding and compositionality; the refiner is a specialized denoiser trained to improve low-noise latents produced by the base. SDXL is commonly used standalone or as a two-stage pipeline (base → refiner) to trade off speed vs. final fidelity. ([huggingface.co](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0?utm_source=openai))
Model Statistics
- Downloads: 1,730,124
- Likes: 7297
- Pipeline: text-to-image
License: openrail++
Model Details
Technical overview: SDXL 1.0 is a latent diffusion model that implements an "ensemble of experts" two-stage approach (base model + refiner) and uses a much larger U-Net backbone than previous Stable Diffusion versions. The base model generates noisy latents at the desired output size; the refiner continues denoising with a model specialized for low-noise refinement. Training and inference design choices include multi-aspect-ratio training, crop-coordinate conditioning to reduce off-frame subjects, and dual text encoders for richer prompt conditioning (OpenCLIP-ViT/G and CLIP-ViT/L). These architectural and training changes are documented in Stability AI’s SDXL technical report and repository. ([gigazine.net](https://gigazine.net/gsc_news/en/20230705-stable-diffusion-xl-report/?utm_source=openai)) Practical notes and capabilities: SDXL natively targets 1024×1024 generation, supports img2img/SDEdit workflows, and is compatible with Hugging Face Diffusers (recommended diffusers >= 0.19.0). Typical inference setups use fp16 weights and GPU acceleration; the community and Stability-AI docs also note options like CPU offload, torch.compile performance gains, and Optimum/OpenVINO or ONNX variants for alternative runtimes. The model card and README list known limitations (difficulty with fine legible text, compositional edge cases, imperfect faces, lossy autoencoding) and recommend responsible use under the Open RAIL++ license. ([huggingface.co](https://huggingface.co/sd-research/stable-diffusion-xl-base-1.0?utm_source=openai)) Community feedback: Early community reports praised improved composition and text handling but flagged that the refiner can sometimes introduce artifacts or change skin/lighting in undesirable ways; users often tune denoising/denoising endpoints to get preferred output. ([reddit.com](https://www.reddit.com/r/StableDiffusion/comments/15aolyj?utm_source=openai))
Key Features
- Two-stage ensemble: base model for latents, optional refiner for final denoising.
- Dual text encoders (OpenCLIP-ViT/G + CLIP-ViT/L) for richer prompt conditioning.
- Native 1024×1024-capable generation and multi-aspect-ratio training.
- Supports text-to-image and img2img/SDEdit-style image-conditioned workflows.
- OpenRAIL++-M license with weights published on Hugging Face for research use.
Example Usage
Example (python):
from diffusers import DiffusionPipeline
import torch
# Basic SDXL base inference (fp16 recommended on CUDA)
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
)
pipe.to("cuda")
prompt = "A cinematic photograph of a red fox in a foggy forest, dramatic light"
image = pipe(prompt=prompt, num_inference_steps=30).images[0]
image.save("sdxl_base_output.png")
# Two-stage: generate latents with base, refine with refiner (pattern from repo/readme)
# (Assumes you have both base and refiner weights downloaded)
# from diffusers import DiffusionPipeline
# base = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
# base.to("cuda")
# refiner = DiffusionPipeline.from_pretrained(
# "stabilityai/stable-diffusion-xl-refiner-1.0",
# text_encoder_2=base.text_encoder_2,
# vae=base.vae,
# torch_dtype=torch.float16,
# use_safetensors=True,
# variant="fp16",
# )
# refiner.to("cuda")
# latent = base(prompt=prompt, num_inference_steps=40, denoising_end=0.8, output_type="latent").images
# final = refiner(prompt=prompt, num_inference_steps=40, denoising_start=0.8, image=latent).images[0]
# final.save("sdxl_refined.png")
# Notes: use diffusers >= 0.19.0, install transformers, accelerate, safetensors and invisible_watermark as recommended in the model README.
# Sources: Hugging Face model README and community usage examples. ([huggingface.co](https://huggingface.co/sd-research/stable-diffusion-xl-base-1.0?utm_source=openai)) Benchmarks
Hugging Face total downloads (provided stat): 1,730,124 downloads (Source: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
Hugging Face likes (provided stat): 7,297 likes (Source: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
Downloads (recent / monthly shown on model page): Downloads last month: 2,209,046 (hub-reported figure) (Source: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
Human evaluation (relative preference): SDXL (base + refiner) preferred over SDXL 0.9, SD 1.5 and SD 2.1 in the reported evaluation charts (Source: https://github.com/Stability-AI/generative-models/blob/main/assets/sdxl_report.pdf)
Cloud availability: Available as a foundation image model in AWS SageMaker JumpStart (Source: https://aws.amazon.com/about-aws/whats-new/2023/07/sdxl-1-0-foundation-model-stability-ai-amazon-sagemaker-jumpstart/)
Key Information
- Category: Image Models
- Type: AI Image Models Tool