Stable Diffusion 3.5 Large - AI Image Models Tool

Overview

Stable Diffusion 3.5 Large is the high-quality, professional-oriented member of Stability AI’s Stable Diffusion 3.5 family. Released as part of the SD3.5 rollout in October 2024, the Large variant is an 8‑billion‑parameter Multimodal Diffusion Transformer (MMDiT) tuned for stronger prompt adherence, improved typography and text rendering, and higher-fidelity outputs up to roughly 1 megapixel. The release emphasizes both image quality and practical deployability — Stability AI published weights for self-hosting on Hugging Face and provided compatibility notes for Diffusers, ComfyUI and other community toolchains (Hugging Face model card; Stability AI announcement). Stable Diffusion 3.5 Large is intended as a flexible base for creative production, research, and product integration. The model uses multiple fixed pretrained text encoders (OpenCLIP/T5 variants), QK‑normalization to stabilize training, and an ecosystem of distilled and “turbo” variants (Large Turbo) that trade some quality for extremely low-step inference. The model is distributed under the Stability AI Community License (free use for individuals and organisations under the stated revenue threshold; enterprise licensing available via Stability AI).

Model Statistics

  • Downloads: 86,061
  • Likes: 3303
  • Pipeline: text-to-image

License: other

Model Details

Architecture and encoders: SD3.5 Large uses a Multimodal Diffusion Transformer (MMDiT) backbone rather than the U-Net-only latent architecture used by earlier Stable Diffusion generations. The model conditions generation via three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-XXL), enabling longer-context and improved prompt understanding (Hugging Face model card). Training & stability improvements: SD3.5 integrates Query-Key (QK) normalization inside transformer blocks to improve training stability and make fine-tuning simpler. The Large variant is reported at ~8B parameters (public coverage and Hugging Face release notes list the Large family as the 8B configuration). Speed / distilled variants: Stability AI provides a Large Turbo (timestep-distilled / ADD) variant that uses Adversarial Diffusion Distillation to enable high-quality sampling in very few steps (reported 4-step sampling for the Turbo variant), useful for draft/interactive workflows. Capabilities: native text-to-image, image-to-image and inpainting/outpainting workflows via Diffusers/ComfyUI integrations. The model supports quantization (bitsandbytes / nf4 4-bit workflows) and model‑CPU offload for reduced VRAM requirements. For full technical notes and example usage see the Hugging Face model card and the Diffusers integration guide (Hugging Face; Hugging Face Diffusers blog).

Key Features

  • MMDiT architecture (Multimodal Diffusion Transformer) for improved text-image alignment.
  • Three fixed text encoders (OpenCLIP-ViT/G, CLIP-ViT/L, T5-XXL) for richer conditioning.
  • QK‑normalization inside transformer blocks to improve training stability and fine-tuning.
  • Large (~8B) variant tuned for high-fidelity outputs at around 1 megapixel resolution.
  • Large Turbo (ADD-distilled) variant enabling very low-step (≈4 step) high-quality sampling.
  • Compatible with Diffusers, ComfyUI, AUTOMATIC1111 toolchains and supports quantized loading.

Example Usage

Example (python):

import torch
from diffusers import StableDiffusion3Pipeline

# Minimal example: load SD3.5 Large with bfloat16 on CUDA
pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    torch_dtype=torch.bfloat16,
)
pipe = pipe.to("cuda")

prompt = "A photorealistic studio portrait of a golden retriever wearing a vintage aviator helmet"
image = pipe(
    prompt,
    num_inference_steps=28,
    guidance_scale=4.0,
).images[0]
image.save("sd35_large_result.png")

# Notes:
# - For constrained VRAM, load a quantized transformer submodule (bitsandbytes / nf4) and use
#   pipeline.enable_model_cpu_offload(), as shown in the official Hugging Face model README.
# - For low-latency drafts, consider the 'stable-diffusion-3.5-large-turbo' variant (4-step sampling).

Benchmarks

Hugging Face — downloads (last month): 86,061 (downloads last month on the model page) (Source: https://huggingface.co/stabilityai/stable-diffusion-3.5-large)

Hugging Face — likes: ~3.3k likes on the model card (Source: https://huggingface.co/stabilityai/stable-diffusion-3.5-large)

Model size (Large): Approx. 8 billion parameters (Large configuration) (Source: https://huggingface.co/blog/sd3-5)

Few-step (Turbo) sampling: Large Turbo: few-step sampling (reported 4 steps via Adversarial Diffusion Distillation) (Source: https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo)

Research evaluation example (OBS-Diff): Reported SD3.5-Large baseline FID 31.59, CLIP 0.3156 (dataset-specific research evaluation) (Source: https://www.aimodels.fyi/papers/arxiv/obs-diff-accurate-pruning-diffusion-models-one)

Last Refreshed: 2026-01-09

Key Information

  • Category: Image Models
  • Type: AI Image Models Tool