Home › Image Models › Stable Diffusion 2-1

Stable Diffusion 2-1 - AI Image Models Tool

Overview

Stable Diffusion 2-1 is the follow-up release in Stability AI’s v2 model line, aimed at improving image fidelity, prompt responsiveness, and high-resolution generation compared with v1.x models. The release introduced a new text-encoder regime (OpenCLIP) and model variants tuned for two common working resolutions: a 512-base model and a 768 high-resolution model, allowing native 512×512 and 768×768 generation respectively. Stability AI published v2.1 as a rapid iteration after v2.0 to give users better prompt expressivity and to bring back previously effective prompt patterns while benefiting from more data and adjusted dataset filtering. ([stability.ai](https://stability.ai/blog/stablediffusion2-1-release7-dec-2022?utm_source=openai)) The v2.1 weights are distributed as downloadable checkpoints (including v2-1_768-ema-pruned.ckpt at ~5.21 GB) and are intended for use with common toolchains such as Hugging Face Diffusers and popular community UIs (AUTOMATIC1111). Companion checkpoints for inpainting, depth-guided generation, and a 4x upscaler are published alongside the main checkpoints. The model card documents the fine-tuning timeline (additional training steps applied to v2 checkpoints) and licensing conditions; Stability AI uses CreativeML/Open RAIL-style licensing for the public checkpoints and has since clarified community licensing and commercial thresholds for enterprise usage. ([huggingface.co](https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/v2-1_768-ema-pruned.ckpt?utm_source=openai))

Key Features

OpenCLIP-ViT/H text encoder enabling richer prompt conditioning and semantic range.
Two main checkpoints: 512-base for memory-efficiency and 768-v for higher native resolution.
Companion checkpoints: 512-inpainting, depth-guided models, and x4 upscaler available.
Distributed as downloadable pruned/EMA checkpoints for local use (works with diffusers and UIs).
Fine-tuned iterations on v2 weights to improve fidelity and reduce undesirable artifacts.
Model card documents safety, limitations, and recommended usage (bias/ misuse guidance included).

Example Usage

Example (python):

from diffusers import StableDiffusionPipeline
import torch

# Note: you may need to accept the model license on Hugging Face and provide an access token
model_id = "stabilityai/stable-diffusion-2-1"

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

prompt = "A cinematic portrait of a female astronaut, ultra-detailed, soft studio lighting"
image = pipe(prompt, guidance_scale=7.5, num_inference_steps=30).images[0]

# Save result
image.save("sd2_1_example.png")

Pricing

The Stable Diffusion 2.1 model weights are published freely under Stability AI’s model card and CreativeML/Open RAIL-style licensing for community use (downloadable from Hugging Face). Stability AI has clarified a “Community License” that permits broad free use for individuals and small businesses up to a revenue threshold (registration/enterprise licensing required above those limits). There is no per-image price for the open checkpoints — commercial or large-scale hosted/API usage (DreamStudio or enterprise APIs) may be subject to paid plans or license agreements with Stability AI. If you need an enterprise or API plan, contact Stability AI for current commercial pricing. ([huggingface.co](https://huggingface.co/stabilityai/stable-diffusion-2-1-base?utm_source=openai))

Benchmarks

Primary checkpoint file size (v2-1_768-ema-pruned.ckpt): 5.21 GB (Source: https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/v2-1_768-ema-pruned.ckpt)

Model variants / default native resolution: 512×512 (base) and 768×768 (v2-1) (Source: https://huggingface.co/stabilityai/stable-diffusion-2-1-base)

Recent monthly downloads (Hugging Face, example model card stat): 255,832 downloads (last month, as reported on model card) (Source: https://huggingface.co/stabilityai/stable-diffusion-2-1-base)

Text encoder: OpenCLIP (ViT/H) used for richer textual conditioning (Source: https://huggingface.co/stabilityai/stable-diffusion-2-1)

Training / fine-tuning additional steps (reported): v2-1-base: ~220k extra fine-tuning steps; v2-1: staged fine-tune of 55k + 155k steps (Source: https://huggingface.co/stabilityai/stable-diffusion-2-1-base https://huggingface.co/patrickvonplaten/v2-1)

Last Refreshed: 2026-01-09

Key Information

Category: Image Models
Type: AI Image Models Tool

Visit Official Website

Stable Diffusion 2-1 - AI Image Models Tool

Overview

Key Features

Example Usage

Pricing

Benchmarks

Key Information

Related Tools

Recraft V3

CodeFormer

GFPGAN

FLUX.1-dev

OmniGen

FLUX1.1 [pro]