Home › Language Models › Reka Flash 3

Reka Flash 3 - AI Language Models Tool

Overview

Reka Flash 3 is a 21‑billion‑parameter, Llama‑compatible reasoning model released by Reka AI as part of the Reka Flash family. It was trained from scratch, instruction‑tuned on curated synthetic and public datasets, and further improved using reinforcement learning with REINFORCE‑style (RLOO) objectives; the open model weights are available under an Apache‑2.0 license on Hugging Face. ([huggingface.co](https://huggingface.co/RekaAI/reka-flash-3)) Reka Flash 3 is positioned as a compact, low‑latency foundation for agentic and enterprise applications and powers Reka’s Nexus platform (AI workers that browse the web, execute code, and inspect internal documents, images, audio and video). The model card and Reka’s product posts emphasize explicit “thinking” / reasoning tags, a budget‑forcing mechanism to limit internal thinking steps, and broad support for local deployment via quantization and Llama‑compatible runtimes. For hosted usage, Reka exposes Flash variants through its API and Research products (separate commercial pricing). ([reka.ai](https://reka.ai/news/reka-launches-nexus?utm_source=openai))

Model Statistics

Downloads: 422
Likes: 389
Parameters: 20.9B

License: apache-2.0

Model Details

Core specs and architecture: Reka Flash 3 is a ~20.9–21B parameter causal language model released in BF16 weights and distributed in a Llama‑compatible format (cl100k_base tokenizer). The model card documents a chat prompt format using explicit 'human:' / 'assistant:' prefixes and a '<sep>' stop token; the model uses a token‑level budget forcing mechanism to limit long internal reasoning traces. ([huggingface.co](https://huggingface.co/RekaAI/reka-flash-3)) Training & capabilities: Reka reports pretraining from scratch on public and synthetic corpora, instruction finetuning on curated data, and a final reinforcement learning stage (RLOO) combining model‑ and rule‑based rewards to improve reasoning and agentic behavior. The Reka Flash family (of which Flash 3 is a member) is described as supporting interleaved multimodal inputs and very long contexts in other releases, while the Flash‑3 model card calls out English‑first performance and recommends coupling with web search for knowledge‑heavy tasks. Quantized distributions (GGUF / Q* quant files) and a 3.1 update focused on coding and agentic planner performance are available for local deployment. ([reka.ai](https://reka.ai/news/introducing-reka-flash?utm_source=openai))

Key Features

21B‑parameter, Llama‑compatible causal reasoning model for chat, coding, and agentic tasks.
Explicit reasoning tags and budget‑forcing to control long internal "thinking" traces.
Trained with supervised finetuning and RLOO reinforcement learning to improve reasoning.
Supports local deployment via extensive quantizations (GGUF/Q* packs) and llama.cpp/vLLM runtimes.
Powers Nexus platform: native web browsing, code execution, and multimodal document analysis.

Example Usage

Example (python):

import transformers

# Quickstart: load the open weights (Llama-compatible) and run a simple generation.
# See model card for prompt format and chat template: Hugging Face model page. ([huggingface.co](https://huggingface.co/RekaAI/reka-flash-3))

tokenizer = transformers.AutoTokenizer.from_pretrained("RekaAI/reka-flash-3")
model = transformers.AutoModelForCausalLM.from_pretrained(
    "RekaAI/reka-flash-3",
    torch_dtype='auto',
    device_map='auto'
)

prompt = {"role": "human", "content": "Write a concise summary of the novel 'Dune'."}
text = tokenizer.apply_chat_template([prompt], tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Pricing

Reka provides hosted pricing for Flash models via its API. Reka Chat lists the “Reka Flash” tier at $0.80 per 1M input tokens and $2.00 per 1M output tokens; Reka Research (web‑research agent products) is priced per 1k requests (examples: reka‑flash‑research variants shown at $25 / 1k requests for standard research calls). The open model weights themselves are licensed Apache‑2.0 and can be downloaded freely from Hugging Face (separate from the hosted API pricing). Hosted pricing and plan names may change — consult Reka’s pricing documentation for current commercial terms. ([docs.reka.ai](https://docs.reka.ai/pricing))

Benchmarks

Parameters: 21B (Source: https://huggingface.co/RekaAI/reka-flash-3)

Hugging Face likes (approx.): 389 likes (model page) (Source: https://huggingface.co/RekaAI/reka-flash-3)

Downloads (last month, HF model card): 422 downloads (last month) (Source: https://huggingface.co/RekaAI/reka-flash-3)

LiveCodeBench v5 improvement (Reka Flash 3.1): Reported +10 points vs Reka Flash 3 (Source: https://huggingface.co/RekaAI/reka-flash-3.1)

Quantized local sizes (examples): BF16 full ≈41.8GB; Q4_K_M ≈13.6GB; Q3 variants ≈9.5–11.4GB (community quant packs) (Source: https://huggingface.co/bartowski/RekaAI_reka-flash-3-GGUF)

Last Refreshed: 2026-03-03

HuggingFace

Key Information

Category: Language Models
Type: AI Language Models Tool

Visit Official Website

Reka Flash 3 - AI Language Models Tool

Overview

Model Statistics

Model Details

Key Features

Example Usage

Pricing

Benchmarks

Key Information

Related Tools

Qwen2.5-7B

DeepSeek-V3

Llama 3

UNfilteredAI-1B

Shuttle-3

WizardLM