Falcon 3 Family - AI Language Models Tool
Overview
Falcon 3 is a family of open‑source, decoder‑only LLMs from the Technology Innovation Institute (TII) covering 1B, 3B, 7B and 10B parameter sizes. The release emphasizes improved math, scientific, reasoning, and code capabilities achieved through a single large‑scale pretraining run and a depth up‑scaling strategy that produced the 10B variant; the collection includes both Base and instruction‑tuned (Instruct) checkpoints and multiple quantized formats for lightweight deployment. ([huggingface.co](https://huggingface.co/blog/falcon3?utm_source=openai)) Falcon 3 was trained with extensive data (TII reports ~14 trillion tokens) and ships with native long‑context support (up to 32k tokens for most sizes, 8k for the 1B model). It also provides preconverted and community‑maintained quantized checkpoints (GGUF, GPTQ int4/int8, AWQ and other very low‑bit variants), making the models practical to run on single‑GPU and even laptop setups. The family is released under TII’s Falcon licensing framework and has been broadly adopted and discussed across Hugging Face and community channels since the December 2024 launch. ([businesswire.com](https://www.businesswire.com/news/home/20241217932198/en/Falcon-3-UAEs-Technology-Innovation-Institute-Launches-Worlds-Most-Powerful-Small-AI-Models-That-Can-Also-be-Run-on-Light-Infrastructures-Including-Laptops?utm_source=openai))
Key Features
- Four sizes: 1B, 3B, 7B (including Mamba variant), and 10B parameter checkpoints.
- Native long‑context training up to 32k tokens for most models (1B supports 8k).
- Pretraining innovations: single large run plus depth up‑scaling to produce the 10B model.
- Instruct‑tuned variants optimized for conversational and instruction‑following tasks.
- Multiple quantized distributions available (GGUF, GPTQ int4/int8, AWQ, very low‑bit formats).
- Released under TII’s Falcon licensing framework for broad community use and research.
Example Usage
Example (python):
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextGenerationPipeline
# Example: load an instruct Falcon 3 model from Hugging Face (requires sufficient GPU memory)
model_id = "tiiuae/falcon3-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# device_map='auto' lets Accelerate/Transformers place layers across available devices
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
generator = TextGenerationPipeline(model=model, tokenizer=tokenizer)
prompt = "Explain Newton's second law in one concise sentence."
result = generator(prompt, max_new_tokens=120, do_sample=False)
print(result[0]["generated_text"]) Benchmarks
GSM8K (Falcon3-10B-Base): 83.0 (Source: https://huggingface.co/blog/falcon3)
MATH-Lvl5 (Falcon3-10B-Base): 22.9 (Source: https://huggingface.co/blog/falcon3)
MBPP (Falcon3-10B-Base): 73.8 (Source: https://huggingface.co/blog/falcon3)
BBH (Big Bench Hard) (Falcon3-10B-Base): 59.7 (Source: https://huggingface.co/blog/falcon3)
MMLU (Falcon3-10B-Base): 73.1 (Source: https://huggingface.co/blog/falcon3)
Training corpus size (reported): ~14 trillion tokens (Source: https://falconllm.tii.ae/falcon3/index.html)
Maximum context length: Up to 32k tokens (1B: 8k) (Source: https://huggingface.co/blog/falcon3)
Key Information
- Category: Language Models
- Type: AI Language Models Tool