OpenLLaMA - AI Language Models Tool
Overview
OpenLLaMA is an open-source reproduction of Meta AI’s LLaMA family, published by the Berkeley AI Research group and hosted under an Apache-2.0 license. According to the project GitHub, OpenLLaMA provides pre-trained 3B, 7B and 13B parameter models trained on publicly available corpora (the v1 models were trained on RedPajama; v2 models use a mixture including Falcon refined-web and StarCoder data). The project publishes both PyTorch weights for Hugging Face Transformers and JAX/EasyLM weights so the checkpoints can be used for research, fine-tuning, and deployment with common toolchains (GitHub: https://github.com/openlm-research/open_llama). The repository includes evaluation results (lm-eval-harness) comparing OpenLLaMA to LLaMA and GPT-J, training details (1T token runs for many models), and practical notes such as tokenizer behavior and recommended usage for code tasks (the v2 models were introduced to address tokenizer and code-generation issues). OpenLLaMA is distributed permissively (Apache-2.0) so it can be used as a drop-in replacement in many LLaMA-compatible codebases; model cards and Hugging Face model pages provide downloadable PyTorch/JAX weights and community-contributed quantized builds and Spaces that showcase the models in production-like settings (Hugging Face: openlm-research/open_llama_7b_v2).
GitHub Statistics
- Stars: 7,526
- Forks: 408
- Contributors: 2
- License: Apache-2.0
- Last Updated: 2023-07-16T08:35:31Z
Key Features
- Permissive Apache-2.0 license for weights and training code.
- Pretrained checkpoints in PyTorch (Transformers) and JAX (EasyLM) formats.
- Model sizes: 3B, 7B, and 13B parameter variants (v1 and v2 releases).
- v2 models trained on a mixture including Falcon refined-web and StarCoder data.
- Tokenizer and usage guidance (use non-fast tokenizer; v2 fixes code tokenization issues).
Example Usage
Example (python):
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
# Choose a model path from Hugging Face Hub (OpenLLaMA weights)
model_path = "openlm-research/open_llama_7b_v2"
# Use the non-fast tokenizer to avoid known tokenization mismatches
tokenizer = LlamaTokenizer.from_pretrained(model_path, use_fast=False)
# Load model with automatic device placement and float16 for memory savings
model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map="auto")
prompt = "Q: What is the largest animal?\nA:"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
out = model.generate(inputs.input_ids.to(model.device), max_new_tokens=64)
print(tokenizer.decode(out[0], skip_special_tokens=True)) Benchmarks
Available parameter sizes: 3B, 7B, 13B parameters (v1 and v2 variants available). (Source: https://github.com/openlm-research/open_llama)
Training tokens: Many models trained to ~1 trillion tokens (1T). (Source: https://github.com/openlm-research/open_llama)
Training throughput (7B): Over 2,200 tokens/second per TPU‑v4 chip reported for the 7B model. (Source: https://github.com/openlm-research/open_llama)
Example evaluation — HellaSwag (OpenLLaMA 7Bv2): HellaSwag accuracy ≈ 0.56 (from lm-eval-harness table in repository). (Source: https://github.com/openlm-research/open_llama)
License: Apache-2.0 (permissive open-source license). (Source: https://github.com/openlm-research/open_llama)
Key Information
- Category: Language Models
- Type: AI Language Models Tool