Stable Diffusion 3.5 Medium - AI Image Models Tool

Overview

Stable Diffusion 3.5 Medium is a consumer-focused text-to-image model from Stability AI that uses a Multimodal Diffusion Transformer (MMDiT‑X) architecture to improve image quality, typography, and prompt understanding while staying efficient enough for many consumer GPUs. The model targets multi-resolution generation (trained progressively from low to high resolution) and was designed to balance fidelity with lower VRAM and latency compared with the Large 3.5 variant. ([huggingface.co](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium?utm_source=openai)) Stable Diffusion 3.5 Medium is distributed under Stability AI’s Community License (free for research and for organizations under certain revenue thresholds; enterprise licensing is available for larger commercial deployments). It supports local use via ComfyUI and programmatic use via Hugging Face Diffusers and Stability API endpoints, and the upstream model card and README include practical notes on token limits, recommended samplers, and compatibility with quantized / low‑VRAM workflows. ([huggingface.co](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium?utm_source=openai))

Model Statistics

  • Downloads: 106,502
  • Likes: 879
  • Pipeline: text-to-image

License: other

Model Details

Architecture and training: SD 3.5 Medium implements the MMDiT‑X (Multimodal Diffusion Transformer with improvements) architecture with dual-attention blocks in early transformer layers and QK‑normalization to stabilize training and fine‑tuning. Training used mixed‑resolution stages (progressing through 256 → 512 → 768 → 1024 → 1440) and random crop augmentation on positional embeddings to improve multi‑resolution coherence. The card references three fixed pretrained text encoders (OpenCLIP variants and T5‑xxl) used at different stages to increase prompt comprehension. ([huggingface.co](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium?utm_source=openai)) Capabilities and usage notes: The Medium model is intended as a lower‑resource alternative to SD 3.5 Large, while preserving improved typography, better prompt adherence, and more consistent rendering of faces and hands for many prompts. The model can accept longer prompts but the card warns about artifacts when T5 tokens exceed ~256 tokens, and recommends sampling with Skip Layer Guidance to improve structure and anatomy coherency. For local inference, common community UIs and toolchains (ComfyUI, Diffusers) are supported; users also report practical workflows and troubleshooting details on community channels. ([huggingface.co](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium?utm_source=openai)) Compatibility and efficiency: Community and partner tooling has produced quantized / accelerated builds (FP8/NF4/torch‑bfloat16) that reduce VRAM and speed up generation; hardware vendors (NVIDIA, others) have published optimization tooling for SD3.5 families. The Medium variant’s parameter count is commonly reported at ~2.5–2.6 billion parameters, making it substantially smaller than the 8B Large model and easier to run on consumer GPUs. For commercial/enterprise scale or licensing beyond the Community License terms, Stability AI requests contact for enterprise licensing. ([blogs.nvidia.com](https://blogs.nvidia.com/blog/rtx-ai-garage-gtc-paris-tensorrt-rtx-nim-microservices/?utm_source=openai))

Key Features

  • MMDiT‑X transformer architecture with dual‑attention early layers.
  • QK‑normalization for improved training stability and customization.
  • Designed for multi‑resolution outputs (training stages up to 1440 resolution).
  • Smaller footprint (~2.5–2.6B params) for lower VRAM and faster inference.
  • Improved typography and complex prompt understanding versus prior Medium models.
  • Compatible with Diffusers, ComfyUI, and Stability API programmatic/endpoints.
  • Practical quantization/acceleration support (NF4/FP8/BFloat16) from community tools.

Example Usage

Example (python):

from diffusers import StableDiffusion3Pipeline
import torch

model_id = "stabilityai/stable-diffusion-3.5-medium"

# Example: load pipeline (adjust torch_dtype and device for your hardware)
pipe = StableDiffusion3Pipeline.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,  # or torch.float16 on supported GPUs
)
pipe.to("cuda")

prompt = (
    "A cinematic photograph of a golden retriever wearing a leather jacket, film grain, 35mm"
)

result = pipe(
    prompt=prompt,
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=512,  # model supports longer sequences but watch for artifacts
)

image = result.images[0]
image.save("sd35_medium_example.png")

# Notes: the model card recommends using Skip Layer Guidance for some subjects and
# warns about artifacts when T5 tokens exceed ~256 in certain settings. See the
# model README for quantized / low‑VRAM examples and ComfyUI workflows.
# Source: Stability AI model card (Hugging Face).

Pricing

Stable Diffusion 3.5 Medium is distributed under Stability AI’s Community License (free for research and for organizations/individuals under specified revenue thresholds; enterprise licensing available for larger commercial deployments). For commercial API or enterprise pricing, contact Stability AI or consult Stability’s licensing pages. (See the Hugging Face model card and Stability AI license pages for full terms.)

Benchmarks

Hugging Face downloads (model repo): 106,502 downloads (Source: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)

Hugging Face likes (model repo): 879 likes (Source: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)

Reported parameter count (community / docs): ≈2.5–2.6 billion parameters (Source: https://blog.comfy.org/sd3-5-comfyui and https://dataconomy.com/2024/11/01/stable-diffusion-3-5-medium-is-launched/)

Reported VRAM requirement (consumer GPUs): ≈9.9 GB VRAM (model body, excluding text encoders) reported in community tests (Source: https://dataconomy.com/2024/11/01/stable-diffusion-3-5-medium-is-launched/)

Max mixed-resolution training stage (reported): Training progressed up to 1440 resolution (mixed-scale stages used) (Source: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)

Last Refreshed: 2026-01-08

Key Information

  • Category: Image Models
  • Type: AI Image Models Tool