ComfyUI-Florence2 - AI Image Tools Tool
Overview
ComfyUI-Florence2 integrates Microsoft’s Florence-2 vision foundation model into ComfyUI, enabling prompt-based vision and vision-language workflows. Supported tasks include captioning, object detection, segmentation, and Document Visual Question Answering (DocVQA) on scanned documents.
Key Features
- Integrates Microsoft Florence-2 into ComfyUI for vision and vision-language workflows
- Prompt-based image captioning
- Object detection outputs usable in ComfyUI pipelines
- Image segmentation support for region-level analysis
- Document Visual QA (DocVQA) on scanned documents
- Open-source GitHub repository with integration code
Ideal Use Cases
- Generate descriptive captions for image datasets
- Detect and label objects in photos or frames
- Create segmentation masks for image editing or analysis
- Answer questions about scanned documents using DocVQA
- Prototype vision-language pipelines within ComfyUI
Getting Started
- Visit the project's GitHub repository to read the README
- Clone or download the repository locally
- Install ComfyUI and any dependencies listed in the README
- Place or configure Florence-2 model files as instructed
- Launch ComfyUI and load the Florence-2 integration nodes
Pricing
Not disclosed in the repository.
Key Information
- Category: Image Tools
- Type: AI Image Tools Tool