ComfyUI-Florence2 - AI Image Tools Tool

Overview

ComfyUI-Florence2 integrates Microsoft’s Florence-2 vision foundation model into ComfyUI, enabling prompt-based vision and vision-language workflows. Supported tasks include captioning, object detection, segmentation, and Document Visual Question Answering (DocVQA) on scanned documents.

Key Features

  • Integrates Microsoft Florence-2 into ComfyUI for vision and vision-language workflows
  • Prompt-based image captioning
  • Object detection outputs usable in ComfyUI pipelines
  • Image segmentation support for region-level analysis
  • Document Visual QA (DocVQA) on scanned documents
  • Open-source GitHub repository with integration code

Ideal Use Cases

  • Generate descriptive captions for image datasets
  • Detect and label objects in photos or frames
  • Create segmentation masks for image editing or analysis
  • Answer questions about scanned documents using DocVQA
  • Prototype vision-language pipelines within ComfyUI

Getting Started

  • Visit the project's GitHub repository to read the README
  • Clone or download the repository locally
  • Install ComfyUI and any dependencies listed in the README
  • Place or configure Florence-2 model files as instructed
  • Launch ComfyUI and load the Florence-2 integration nodes

Pricing

Not disclosed in the repository.

Key Information

  • Category: Image Tools
  • Type: AI Image Tools Tool