Home › RAG & Search › Firecrawl

Firecrawl - AI RAG & Search Tool

Overview

Firecrawl is a web-data API built specifically for AI workflows. It crawls and scrapes complete websites and pages and returns LLM-ready outputs such as cleaned Markdown, structured JSON, HTML, screenshots, extracted links, and page metadata. Firecrawl is designed to handle modern web complexity — including JavaScript-rendered pages, proxy routing, and anti-bot protections — so it can produce high-quality, machine-consumable outputs from dynamic sites that traditional scrapers often fail to capture. The project provides a set of REST endpoints (crawl, scrape, map, search, extract), SDKs for Python and Node, and integrations with popular retrieval and RAG frameworks (LangChain, LlamaIndex, Dify, Langflow). A hosted API is available alongside an option to self-host the service from the open-source codebase. According to the GitHub repository (https://github.com/firecrawl/firecrawl), the focus is on delivering structured, downstream-ready artifacts for retrieval-augmented generation and search, reducing preprocessing work for LLM pipelines and vector DB ingestion.

GitHub Statistics

Stars: 73,956
Forks: 5,703
Contributors: 124
License: AGPL-3.0
Primary Language: TypeScript
Last Updated: 2026-01-09T13:39:14Z
Latest Release: v2.7.0

Key Features

Full-site crawling that captures pages, sitemaps, and link graphs for entire domains.
LLM-ready outputs: cleaned Markdown, structured JSON, raw HTML, screenshots, and metadata.
Handles dynamic JS-rendered pages with headless rendering, proxy support, and anti-bot handling.
Dedicated endpoints for crawl, scrape, map, search, and extract to support RAG pipelines.
Official SDKs (Python, Node) and integrations with LangChain, LlamaIndex, Dify, and Langflow.

Code Examples

Python

import requests

# Replace BASE_URL and API_KEY with your values
BASE_URL = "https://api.firecrawl.example"  # placeholder
API_KEY = "YOUR_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

payload = {
    "url": "https://example.com",
    "depth": 1,
    "render_js": True
}

resp = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
resp.raise_for_status()
result = resp.json()

# Typical returned fields (from project description): markdown, html, screenshot, links, metadata
print("Markdown snippet:\n", result.get("markdown", "(none)"))
print("Found links:", result.get("links", []))

Curl

curl -X POST "https://api.firecrawl.example/crawl" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "depth": 1, "render_js": true}'

# Response is JSON with fields like `markdown`, `html`, `screenshot`, `links`, and `metadata`.

Javascript

const fetch = require('node-fetch');

const BASE_URL = 'https://api.firecrawl.example'; // placeholder
const API_KEY = 'YOUR_API_KEY';

async function run() {
  const resp = await fetch(`${BASE_URL}/scrape`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url: 'https://example.com', render_js: true })
  });
  const data = await resp.json();
  console.log('Markdown:', data.markdown);
  console.log('Screenshot URL:', data.screenshot);
}

run().catch(console.error);

API Overview

Last Refreshed: 2026-01-09

GitHub

Key Information

Category: RAG & Search
Type: AI RAG & Search Tool

Visit Official Website

Firecrawl - AI RAG & Search Tool

Overview

GitHub Statistics

Key Features

Code Examples

Python

Curl

Javascript

API Overview

Key Information

Related Tools

Crawl4AI

DeepSeek-R1

Perplexica

GPT Researcher

Maxun

DeepSeek-R1