Stable Diffusion v1.5 - AI Image Models Tool
Overview
Stable Diffusion v1.5 is a widely used latent diffusion text-to-image model that generates high-quality, photorealistic 512×512 images from text prompts. It was produced by resuming from the v1.2 checkpoint and then fine-tuning for 595,000 additional steps on the LAION "laion-aesthetics v2 5+" subset; the fine-tuning included a 10% dropout of text-conditioning to improve classifier-free guidance sampling. ([huggingface.co](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)) The model ships as an inference-friendly checkpoint (an EMA/pruned safetensors variant is commonly distributed for lower VRAM use) and is available for use through Hugging Face's Diffusers pipeline as well as popular user interfaces and front-ends such as ComfyUI, AUTOMATIC1111, SD.Next, and InvokeAI. Typical example prompts used in docs include "a photo of an astronaut riding a horse on Mars." ([huggingface.co](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5))
Model Statistics
- Downloads: 1,608,961
- Likes: 965
- Pipeline: text-to-image
License: creativeml-openrail-m
Model Details
Architecture and components: Stable Diffusion v1.5 is a latent diffusion model (LDM) that combines a VAE autoencoder, a UNet-based denoiser operating in latent space, and a CLIP ViT-L/14 text encoder for cross-attention conditioning. Images are encoded to latents (downsample factor 8, 4 latent channels) and the UNet is trained to predict noise in that latent space; the non-pooled CLIP outputs are fed into the UNet via cross-attention. ([huggingface.co](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)) Training and dataset: v1.5 was initialized from v1.2 and fine-tuned for 595k steps at 512×512 resolution on laion-aesthetics v2 5+, using AdamW with gradient accumulation and 32×8 A100 GPUs reported in the model card. The team intentionally used 10% dropping of text-conditioning (classifier-free guidance dropout) during fine‑tuning to make guidance sampling stronger at inference. Parameter counts for the full checkpoint are not publicly enumerated in the official model card. ([huggingface.co](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)) Capabilities and limits: v1.5 produces high-quality photorealistic and stylized outputs, supports image-to-image and inpainting variants (separate inpainting checkpoint exists), but has documented limitations: imperfect photorealism, weak legible text rendering, composition challenges, face artifacting and social/cultural bias due to training on LAION subsets. Users are advised to use the Diffusers Safety Checker and follow the CreativeML OpenRAIL-M license provisions. ([huggingface.co](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5))
Key Features
- Latent diffusion (LDM) with VAE + UNet for efficient 512×512 generation.
- Uses CLIP ViT-L/14 text encoder for cross-attention conditioned prompts.
- Fine-tuned with 10% text-conditioning dropout for improved classifier-free guidance.
- Distributed inference checkpoint (EMA/pruned) suitable for lower-VRAM setups (≈4.27 GB).
- Compatible with Diffusers, ComfyUI, AUTOMATIC1111, SD.Next and other UIs.
Example Usage
Example (python):
from diffusers import StableDiffusionPipeline
import torch
model_id = "sd-legacy/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on Mars, high detail, cinematic lighting"
image = pipe(prompt, guidance_scale=7.5, num_inference_steps=50).images[0]
image.save("astronaut_rides_horse_v1_5.png") Benchmarks
Hugging Face likes (model page): 965 likes (Source: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)
Fine-tuning steps & resolution: 595,000 steps at 512×512 (fine-tuned from v1.2) (Source: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)
Pruned (EMA-only) checkpoint file size: v1-5-pruned-emaonly ≈ 4.27 GB (safetensors / ckpt variants) (Source: https://huggingface.co/utnah/ckpt/blob/main/sd-v1-5.ckpt)
Common evaluation protocol (model card): Evaluation used 50 PLMS/PNDM steps on 10,000 COCO2017 prompts at 512×512 (Source: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)
Key Information
- Category: Image Models
- Type: AI Image Models Tool