ComfyUI-Florence2 - AI Image Tools Tool
Overview
ComfyUI-Florence2 is an open-source integration that brings Microsoft’s Florence-2 vision foundation model into the node-based ComfyUI workflow. The project exposes Florence-2’s promptable vision and vision-language capabilities inside ComfyUI so users can build pipelines for captioning, object detection, segmentation, and Document Visual Question Answering (DocVQA) on scanned documents. Because it plugs into ComfyUI, Florence-2 can be combined with image preprocessing, conditioning, and downstream export nodes to create reproducible, visual pipelines for research and production prototyping. The repository is MIT-licensed and intended for local or self-hosted use; model weights and any required credentials (for example, Hugging Face or Azure access where applicable) must be obtained according to the upstream model’s distribution terms. Typical usage patterns include interactive prompt-driven image analysis, automated annotation workflows for scanned forms, and using Florence-2 outputs as inputs to other ComfyUI nodes for post-processing, visualization, or dataset generation.
GitHub Statistics
- Stars: 1,561
- Forks: 121
- Contributors: 15
- License: MIT
- Primary Language: Python
- Last Updated: 2025-12-15T21:09:40Z
According to the GitHub repository, ComfyUI-Florence2 has 1,561 stars, 121 forks, and 15 contributors, and is distributed under an MIT license. The project shows recent activity (last commit on 2025-12-15), indicating ongoing maintenance. These metrics suggest moderate community interest and a small group of active maintainers. Contributors and issue activity appear concentrated on the GitHub project page, with forks available for customization and experimentation.
Installation
Install via pip:
git clone https://github.com/kijai/ComfyUI-Florence2.gitmv ComfyUI-Florence2 <path-to-ComfyUI>/custom_nodes/Florence2cd <path-to-ComfyUI>pip install -r custom_nodes/Florence2/requirements.txtRestart ComfyUI and follow the repo README to configure model weights and credentials Key Features
- Prompt-based vision and vision-language inference within ComfyUI pipelines
- Image captioning for descriptive summaries of photos and scanned pages
- Object detection outputs usable for downstream annotation or masking
- Segmentation capabilities for pixel-level object masks
- Document Visual Question Answering (DocVQA) on scanned documents
Community
Community engagement is moderate: 1,561 stars and 121 forks show interest, and 15 contributors have participated. Issues, feature requests, and discussion happen on the GitHub repository; the project had a recent commit on 2025-12-15 indicating active maintenance. Because usage typically involves external Florence-2 weights and credentials, community support focuses on integration, installation, and pipeline examples.
Key Information
- Category: Image Tools
- Type: AI Image Tools Tool