Best AI Image Models Tools
Explore 51 AI image models tools to find the perfect solution.
Image Models
51 toolsRecraft V3
A text-to-image generative AI model capable of generating images with long text, available via Replicate’s API.
CodeFormer
A robust face restoration algorithm designed to repair old photos or improve AI-generated faces, delivering improved image quality.
GFPGAN
A practical AI tool for face restoration, capable of enhancing and restoring old and AI-generated faces, available for self-hosting via Docker.
FLUX.1-dev
High-quality image generation model with ComfyUI and Diffusers support, available under a non-commercial license.
OmniGen
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts, simplifying the image generation process without the need for additional network modules or preprocessing steps. It supports various tasks such as text-to-image generation, identity-preserving generation, image editing, and more.
FLUX1.1 [pro]
A new text-to-image AI model capable of generating images six times faster than its predecessor, with higher quality, better prompt adherence, and more diversity in outputs. It includes a prompt upsampling feature that utilizes a language model to enhance prompts for improved image generation.
Stable Diffusion 3.5 Medium
A Multimodal Diffusion Transformer text-to-image generative model by Stability AI that offers improved image quality, typography, complex prompt understanding, and resource efficiency. It supports local or programmatic use via diffusers, ComfyUI, and API endpoints.
Stable Diffusion 3 Medium
A multimodal diffusion transformer model that generates images from textual descriptions with improvements in image quality, typography, and resource-efficiency for creative applications.
Anything V4.0
An AI image generation model known for incorporating components from AbyssOrangeMix2 to deliver versatile image synthesis across styles.
Stable Diffusion
A high-resolution image synthesis model that enables users to generate images from textual descriptions, supporting creative and design applications.
Ideogram-V2
Ideogram-V2 is an advanced image generation model that excels in inpainting, prompt comprehension, and text rendering. It is designed to transform ideas into captivating designs, realistic images, innovative logos, and posters. The model is accessible via an API on Replicate and offers unique features for creative image editing.
Stable Diffusion 2-1
The latest iteration of StabilityAI’s text-to-image model, delivering high-quality image generation from text prompts.
Shuttle 3 Diffusion
Shuttle 3 Diffusion is a text-to-image diffusion model that generates detailed and diverse images from textual prompts in just 4 steps. It offers enhanced image quality, improved typography, and resource efficiency, and can be integrated via API, Diffusers, or ComfyUI.
Recraft V3 SVG
A text-to-image model focused on generating high-quality SVG images, including logotypes and icons, with controlled text placement.
Stable Diffusion 3.5 Large
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer text-to-image generative model developed by Stability AI. It generates images from text prompts with enhanced image quality, typography, and resource-efficiency. The model supports integration with Diffusers, ComfyUI, and other programmatic interfaces, and is available under the Stability Community License.
xinsir/controlnet-union-sdxl-1.0
A ControlNet++ model for text-to-image generation and advanced image editing. Built on Stable Diffusion XL, it supports over 10 control conditions and advanced features such as tile deblurring, tile variation, super resolution, inpainting, and outpainting. The model is designed for high-resolution, multi-condition image generation and editing.
Playground v2.5 – 1024px Aesthetic Model
A diffusion-based text-to-image generative model that produces highly aesthetic images at a resolution of 1024x1024 across various aspect ratios. It outperforms several state-of-the-art models in aesthetic quality and is accessible via an API on Replicate, with integration support for Hugging Face Diffusers.
Hunyuan3D 2.0
A diffusion-based model for generating high-resolution textured 3D assets, featuring a two-stage pipeline with a shape generation component (Hunyuan3D-DiT) and a texture synthesis component (Hunyuan3D-Paint). It supports both image-to-3D and text-to-3D workflows, and includes a user-friendly production platform (Hunyuan3D-Studio) for mesh manipulation and animation.
FLUX.1 Redux
An adapter for FLUX.1 base models that generates slight variations of a given image, enabling creative refinements and flexible high-resolution outputs.
Flux.1
The official inference repository for FLUX.1 models, offering AI-powered text-to-image and inpainting services, maintained in collaboration with its authors.
Anything V5
A text-to-image diffusion model from the Anything series designed for anime-style image generation. The model is available in multiple variants (e.g., V5-Prt) and is optimized for precise prompt-based outputs. It leverages Stable Diffusion pipelines and is hosted on Hugging Face with detailed versioning and usage instructions.
prunaai/hidream-l1-dev
An optimized version of the hidream-l1-dev model using the pruna ai optimisation toolkit. This model runs on Nvidia A100 GPUs, is available via an API on Replicate, supports rapid predictions (around 15 seconds per run), and has been executed over 28.5K times.
Stable Diffusion v1.5
A latent diffusion-based text-to-image generation model that produces photorealistic images from text prompts. It builds upon the Stable Diffusion v1.2 weights and is fine-tuned for improved classifier-free guidance. It can be used via the Diffusers library, ComfyUI, and other interfaces.
Hunyuan3D-2.0
An AI application that generates high-resolution 3D models from images or text descriptions, enabling creative 3D content creation through AI.
Stability AI – Generative Models
Open‑source generative image/video model implementations and sampling scripts (e.g., SV3D/SV4D)
Stable Diffusion XL Base 1.0
A diffusion-based text-to-image generative model developed by Stability AI. This model uses a latent diffusion approach with dual fixed text encoders, and can be used standalone or combined with a refinement model for enhanced high-resolution outputs. It supports both direct image generation and img2img workflows leveraging SDEdit.
HiDream-I1
An open-source image generative model with 17B parameters, delivering state-of-the-art image generation quality, accompanied by a dedicated Hugging Face Space for experimentation.
Shakker-Labs/AWPortraitCN2
A text-to-image model focused on generating portraits with Eastern aesthetics. The updated version expands character depiction across various age groups and themes including cuisine, architecture, traditional ethnic costumes, and diverse environments. It is based on the stable-diffusion/flux framework and released under a non-commercial license.
FLUX.1 Kontext
An AI tool that merges two images into a single cohesive output using creative image blending with text prompts.
FLUX.1 Kontext
FLUX.1 Kontext is a new image editing model from Black Forest Labs that leverages text prompts for precise image modifications, including color swaps, background edits, text replacements, style transfers, and aspect ratio changes. It features multiple variants (Pro, Max, and an upcoming Dev) along with a conversational interface (Kontext Chat) to simplify the editing process.
Flux1.1 Pro – Ultra
Flux1.1 Pro – Ultra is an advanced text-to-image diffusion model by Black Forest Labs available on Replicate. It offers ultra mode for generating high-resolution images (up to 4 megapixels) at impressive speeds (around 10 seconds per sample) and a raw mode that produces images with a more natural, candid aesthetic.
Flux-uncensored
Flux-uncensored is a text-to-image diffusion model hosted on Hugging Face by enhanceaiteam. It leverages the stable-diffusion pipeline, LoRA, and the fluxpipeline to generate images from text prompts. The model is marked as 'Not-For-All-Audiences', indicating that it might produce sensitive content.
FLUX.1 Fill [dev]
FLUX.1 Fill [dev] is a 12-billion parameter rectified flow transformer developed by Black Forest Labs designed for text-guided inpainting. It fills specific areas in an existing image based on a textual description, enabling creative image editing workflows. It comes with a non-commercial license and integrates seamlessly with diffusers.
FLUX.1
FLUX.1 is an open‐source state‐of‐the‐art text‐to‐image generation model developed by Black Forest Labs. It excels in prompt adherence, visual detail, and diverse output quality. Available via Replicate's API, FLUX.1 comes in three variants (pro, dev, schnell) with different pricing models.
Ideogram 3.0
Ideogram 3.0 is a text-to-image generation model available on Replicate that offers three variants—Turbo, Balanced, and Quality—to cater for fast iterations, balanced outputs, and high-fidelity results. It delivers improved realism, enhanced text rendering, precise layout generation, and advanced style transfer capabilities, making it ideal for graphic design, marketing, and creative visual content creation.
Recraft V3
A text-to-image generation model specialized in creating images with long text and diverse styles, ensuring precise control over content layout.
IP-Adapter
IP-Adapter is a lightweight image prompt adapter developed by Tencent AI Lab that enables pre-trained text-to-image diffusion models to incorporate image prompts along with text prompts for multimodal image generation. With only 22M parameters, it offers comparable or improved performance compared to fine-tuned models and supports integration with various controllable generation tools.
Realistic Vision V6.0 B1 noVAE
Realistic Vision V6.0 "New Vision" is a beta diffusion-based text-to-image model focused on realism and photorealism. It is released on Hugging Face and provides detailed guidelines on resolutions, generation parameters, and recommended workflows (including using a VAE for quality improvements).
Juggernaut-XL v8
Juggernaut-XL v8 is a fine-tuned text-to-image diffusion model built on Stable Diffusion XL, designed for photo-realistic art generation. It is part of the RunDiffusion suite and is intended for creative visual content generation, though it cannot be used behind API services. Business inquiries and commercial licensing are available via email.
FLUX.1 Kontext [dev]
FLUX.1 Kontext [dev] is a state-of-the-art, open-weight text-based image editing model developed by Black Forest Labs. It enables detailed image edits using text prompts, such as style transfer, object modifications, text replacement, background swapping, and preserving character consistency. The model offers clear instructions on best prompting practices and is available under a non-commercial license with commercial use options via Replicate.
StoodioAI Fashion Model
A custom-trained model for generating unique fashion designs, available via API on the Replicate platform.
Recraft V3 SVG
A text-to-image generative model that produces high-quality SVG (vector) images including logos, icons, and branded designs. It offers precise control over text and image placement, supports a variety of styles, and allows brand style customization by uploading reference images.
Flux Schnell
A fast text-to-image generation model optimized for local development and personal use, developed by Black Forest Labs. It provides an API for rapid text-to-image synthesis, making it ideal for personal projects and local experimentation.
Ideogram v2 Inpainting Model
Ideogram v2 is a high-quality inpainting model available via Replicate’s API. It comes in two variants – the best quality version and a faster 'turbo' variant – and is adept at not only inpainting images but also generating new images (including effective text generation) for various creative applications.
FLUX Family of Models (Black Forest Labs)
A suite of API-accessible image generation and editing models that enable users to generate high-resolution images from text prompts, perform advanced inpainting, outpainting, edge-guided editing, and rapid image variation. The collection includes variants optimized for realism (FLUX1.1 Pro Ultra), speed (FLUX.1 Schnell), and prototyping (FLUX.1 Dev), among others.
FLUX.1 Kontext
FLUX.1 Kontext is an advanced image editing model from Black Forest Labs that enables users to modify images through text prompts. It supports various editing tasks such as style transfer, text editing, and character consistency adjustments. It is available in multiple variants (Pro, Max, and an upcoming Dev version) to balance quality and speed.
FLUX.1
FLUX.1 is an innovative text-to-image generative model that uses a novel flow matching technique instead of traditional diffusion. It produces images with a distinctive, fluid aesthetic, achieves faster generation speed, and offers refined control over light, texture, and composition. An optimized variant (FLUX.1 [schnell]) is available for local execution on Replicate.
FLUX.1 Redux [dev]
An open-weight image variation model by Black Forest Labs that generates new image versions while preserving key elements of the original.
Google Gemini 2.5 Flash Image
A state-of-the-art text-to-image generation and editing model from Google, designed for fast, conversational, multi-turn creative workflows. It offers native image creation, multi-image fusion, consistent character and style maintenance, conversational natural language editing, visual reasoning, and embeds SynthID watermarks. The tool is accessible via the Gemini API, Google AI Studio, and Vertex AI.
HiDream-I1-Full
HiDream-I1-Full is an open-source text-to-image generative foundation model with 17B parameters. Built using a sparse diffusion transformer, it delivers state-of-the-art image quality across multiple styles (photorealistic, cartoon, artistic, etc.) and boasts best-in-class prompt following as demonstrated by benchmark evaluations such as HPSv2.1, GenEval, and DPG-Bench. The model is commercially friendly and includes a Gradio demo and detailed inference scripts for easy deployment.
ByteDance Seedream 4
Seedream 4.0 is ByteDance’s unified text-to-image generation and image editing model. It supports high-resolution (up to 4K) and fast inference along with natural language prompt editing, multi-reference input, batch workflows, and versatile style transfer.