Firecrawl - AI RAG and Search Tool

Overview

Firecrawl is a web-data API built specifically for AI workflows. It crawls and scrapes complete websites and pages and returns LLM-ready outputs such as cleaned Markdown, structured JSON, HTML, screenshots, extracted links, and page metadata. Firecrawl is designed to handle modern web complexity — including JavaScript-rendered pages, proxy routing, and anti-bot protections — so it can produce high-quality, machine-consumable outputs from dynamic sites that traditional scrapers often fail to capture. The project provides a set of REST endpoints (crawl, scrape, map, search, extract), SDKs for Python and Node, and integrations with popular retrieval and RAG frameworks (LangChain, LlamaIndex, Dify, Langflow). A hosted API is available alongside an option to self-host the service from the open-source codebase. According to the GitHub repository (https://github.com/firecrawl/firecrawl), the focus is on delivering structured, downstream-ready artifacts for retrieval-augmented generation and search, reducing preprocessing work for LLM pipelines and vector DB ingestion.

GitHub Statistics

  • Stars: 73,956
  • Forks: 5,703
  • Contributors: 124
  • License: AGPL-3.0
  • Primary Language: TypeScript
  • Last Updated: 2026-01-09T13:39:14Z
  • Latest Release: v2.7.0

Key Features

  • Full-site crawling that captures pages, sitemaps, and link graphs for entire domains.
  • LLM-ready outputs: cleaned Markdown, structured JSON, raw HTML, screenshots, and metadata.
  • Handles dynamic JS-rendered pages with headless rendering, proxy support, and anti-bot handling.
  • Dedicated endpoints for crawl, scrape, map, search, and extract to support RAG pipelines.
  • Official SDKs (Python, Node) and integrations with LangChain, LlamaIndex, Dify, and Langflow.

Code Examples

Python

import requests

# Replace BASE_URL and API_KEY with your values
BASE_URL = "https://api.firecrawl.example"  # placeholder
API_KEY = "YOUR_API_KEY"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

payload = {
    "url": "https://example.com",
    "depth": 1,
    "render_js": True
}

resp = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
resp.raise_for_status()
result = resp.json()

# Typical returned fields (from project description): markdown, html, screenshot, links, metadata
print("Markdown snippet:\n", result.get("markdown", "(none)"))
print("Found links:", result.get("links", []))

Curl

curl -X POST "https://api.firecrawl.example/crawl" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "depth": 1, "render_js": true}'

# Response is JSON with fields like `markdown`, `html`, `screenshot`, `links`, and `metadata`.

Javascript

const fetch = require('node-fetch');

const BASE_URL = 'https://api.firecrawl.example'; // placeholder
const API_KEY = 'YOUR_API_KEY';

async function run() {
  const resp = await fetch(`${BASE_URL}/scrape`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url: 'https://example.com', render_js: true })
  });
  const data = await resp.json();
  console.log('Markdown:', data.markdown);
  console.log('Screenshot URL:', data.screenshot);
}

run().catch(console.error);

API Overview

Last Refreshed: 2026-01-09

Key Information

  • Category: RAG and Search
  • Type: AI RAG and Search Tool