Falcon 3 Family - AI Language Models Tool

Overview

Falcon 3 Family is a set of open-source, decoder-only large language models developed by the Technology Innovation Institute (TII), spanning 1B to 10B parameters and optimized for math, science and code tasks. The release combines a single large pretraining run (reported at 14 trillion tokens on 1024 H100 GPUs) with depth up‑scaling and knowledge‑distillation techniques to produce compact and high‑performing variants (1B, 3B, Mamba‑7B, 7B, 10B). Models offer transformer‑based designs with Grouped Query Attention (GQA), SwiGLU activations and wider head dimensions to improve throughput and reasoning, and the Mamba 7B variant implements a state‑space (SSM/SSLM) architecture for long‑context efficiency. ([huggingface.co](https://huggingface.co/blog/falcon3)) Falcon 3 models support extended contexts (up to 32k tokens for most variants; 8k for the 1B model), are released in base and instruction‑tuned forms, and are available in multiple inference/quantized formats (GGUF, GPTQ Int4/Int8, AWQ and 1.58‑bit variants) to lower deployment cost. The family was published with a Falcon LLM license (TII Falcon‑LLM License) and is distributed on Hugging Face with model cards and quantized community builds; commercial exploitation may be subject to the license terms and commercial contact requirements in TII’s licensing documentation. Early community feedback shows broad interest for research and on‑premise use, with active discussions and quantized builds in the Hugging Face community. ([huggingface.co](https://huggingface.co/tiiuae/Falcon3-10B-Base?utm_source=openai))

Key Features

  • Model family from 1B to 10B parameters for scale-appropriate deployment.
  • Extended context support up to 32K tokens (1B variant supports 8K).
  • Improved math and coding: strong GSM8K and MBPP scores for the 10B model.
  • Mamba 7B SSM variant for long‑context efficiency and lower memory usage.
  • Multiple delivery formats: Base/Instruct, GGUF, GPTQ, AWQ and 1.58‑bit quantizations.

Example Usage

Example (python):

from transformers import pipeline

# Example: load the 10B instruct variant via Hugging Face Transformers
# (ensure you have enough GPU memory or use device_map="auto" / accelerate)
pipe = pipeline(
    "text-generation",
    model="tiiuae/Falcon3-10B-Base",
    torch_dtype="bfloat16",
    device_map="auto"
)

prompt = "Write a clear, step-by-step solution to: Evaluate the integral \u222b_0^1 x^2 dx"
out = pipe(prompt, max_new_tokens=256, do_sample=False)
print(out[0]['generated_text'])

Benchmarks

GSM8K (Falcon3-10B-Base): 83.0 (Source: https://huggingface.co/blog/falcon3)

MATH Lvl5 (Falcon3-10B-Base): 22.9 (Source: https://huggingface.co/blog/falcon3)

MBPP (code) (Falcon3-10B-Base): 73.8 (Source: https://huggingface.co/blog/falcon3)

MMLU (5-shot, Falcon3-10B-Base): 73.1 (Source: https://huggingface.co/tiiuae/Falcon3-10B-Base)

Context length (most variants): 32,000 tokens (8K for 1B) (Source: https://huggingface.co/tiiuae/Falcon3-10B-Base)

Last Refreshed: 2026-03-03

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool