Firecrawl - AI RAG and Search Tool
Overview
Firecrawl is a web-data API built specifically for AI workflows. It crawls and scrapes complete websites and pages and returns LLM-ready outputs such as cleaned Markdown, structured JSON, HTML, screenshots, extracted links, and page metadata. Firecrawl is designed to handle modern web complexity — including JavaScript-rendered pages, proxy routing, and anti-bot protections — so it can produce high-quality, machine-consumable outputs from dynamic sites that traditional scrapers often fail to capture. The project provides a set of REST endpoints (crawl, scrape, map, search, extract), SDKs for Python and Node, and integrations with popular retrieval and RAG frameworks (LangChain, LlamaIndex, Dify, Langflow). A hosted API is available alongside an option to self-host the service from the open-source codebase. According to the GitHub repository (https://github.com/firecrawl/firecrawl), the focus is on delivering structured, downstream-ready artifacts for retrieval-augmented generation and search, reducing preprocessing work for LLM pipelines and vector DB ingestion.
GitHub Statistics
- Stars: 73,956
- Forks: 5,703
- Contributors: 124
- License: AGPL-3.0
- Primary Language: TypeScript
- Last Updated: 2026-01-09T13:39:14Z
- Latest Release: v2.7.0
Key Features
- Full-site crawling that captures pages, sitemaps, and link graphs for entire domains.
- LLM-ready outputs: cleaned Markdown, structured JSON, raw HTML, screenshots, and metadata.
- Handles dynamic JS-rendered pages with headless rendering, proxy support, and anti-bot handling.
- Dedicated endpoints for crawl, scrape, map, search, and extract to support RAG pipelines.
- Official SDKs (Python, Node) and integrations with LangChain, LlamaIndex, Dify, and Langflow.
Code Examples
Python
import requests
# Replace BASE_URL and API_KEY with your values
BASE_URL = "https://api.firecrawl.example" # placeholder
API_KEY = "YOUR_API_KEY"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
payload = {
"url": "https://example.com",
"depth": 1,
"render_js": True
}
resp = requests.post(f"{BASE_URL}/crawl", json=payload, headers=headers)
resp.raise_for_status()
result = resp.json()
# Typical returned fields (from project description): markdown, html, screenshot, links, metadata
print("Markdown snippet:\n", result.get("markdown", "(none)"))
print("Found links:", result.get("links", [])) Curl
curl -X POST "https://api.firecrawl.example/crawl" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "depth": 1, "render_js": true}'
# Response is JSON with fields like `markdown`, `html`, `screenshot`, `links`, and `metadata`. Javascript
const fetch = require('node-fetch');
const BASE_URL = 'https://api.firecrawl.example'; // placeholder
const API_KEY = 'YOUR_API_KEY';
async function run() {
const resp = await fetch(`${BASE_URL}/scrape`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ url: 'https://example.com', render_js: true })
});
const data = await resp.json();
console.log('Markdown:', data.markdown);
console.log('Screenshot URL:', data.screenshot);
}
run().catch(console.error); API Overview
Key Information
- Category: RAG and Search
- Type: AI RAG and Search Tool