EleutherAI/gpt-neox-20b - AI Language Models Tool
Overview
GPT-NeoX-20B is an open-source, 20-billion-parameter autoregressive transformer released by EleutherAI for research and experimentation. It was trained on The Pile (an ~825 GiB English-language corpus) and is distributed under an Apache-2.0 license, making both weights and training code available for replication, fine-tuning, and downstream experiments. The released model and documentation emphasize research uses, few-shot evaluation, and transfer for specialized tasks rather than out-of-the-box deployment as a production chat system. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b)) Technically aligned with GPT-family designs, GPT-NeoX-20B was implemented in the GPT-NeoX training framework (Megatron + DeepSpeed influenced) and made available via Hugging Face Transformers for inference and experimentation. The project provides both slim inference checkpoints and full training checkpoints, extensive training/config files, and community-supported quantized builds and inference recipes. These resources have driven adoption across community Spaces, leaderboards, and forks used for quantized local inference. ([github.com](https://github.com/EleutherAI/gpt-neox))
Model Statistics
- Downloads: 14,415
- Likes: 576
- Pipeline: text-generation
- Parameters: 20.7B
License: apache-2.0
Model Details
Architecture and size: GPT-NeoX-20B is a dense causal (autoregressive) transformer with approximately 20.55 billion parameters, 44 transformer layers, model dimensionality d_model=6144, 64 attention heads (head dim=96), and a vocabulary of 50,257 tokens. The model uses rotary position embeddings (RoPE) and supports sequence lengths up to 2048 tokens. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b)) Training and engineering: The model was trained on The Pile using fp16 precision with a training regime of roughly 150,000 optimization steps and an effective batch size of ~3.15M tokens (1538 sequences of length 2048). Training used tensor and pipeline parallelism across many GPUs; crowdfunding and cloud support (e.g., CoreWeave) were reported for the large-scale runs. Checkpoints are published (slim inference weights ~39GB; full weights including optimizer states available separately) and the EleutherAI GPT-NeoX repository provides configs, training scripts, and deployment guidance. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b)) Tokenizer and runtime: GPT-NeoX-20B uses a tokenizer variant that allocates additional tokens to whitespace characters (useful for code and certain generation tasks). Hugging Face’s Transformers supports loading the model via GPTNeoXForCausalLM/AutoModelForCausalLM, and the docs recommend loading in half-precision (torch.float16) and using attn_implementation optimizations (e.g., Flash Attention 2) for faster, lower-memory inference. Community quantized builds (4-bit/gguf/U8, etc.) are available for local, low-cost inference. ([huggingface.co](https://huggingface.co/docs/transformers/en/model_doc/gpt_neox?utm_source=openai))
Key Features
- 20+ billion parameters in a dense autoregressive transformer architecture.
- Trained on The Pile (≈825 GiB English text) for broad-language pretraining.
- Rotary position embeddings (RoPE) and 2048-token context window.
- Weights and training code released under Apache-2.0 for research and commercial use.
- Slim inference weights (~39 GB) and full training checkpoints (optimizer states) available.
- Compatible with Hugging Face Transformers; supports Flash Attention 2 for faster inference.
- Demonstrates strong few-shot gains vs. similarly sized models in published evaluations.
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Recommended: load in fp16 and let Transformers place layers on devices automatically
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
model = AutoModelForCausalLM.from_pretrained(
"EleutherAI/gpt-neox-20b",
device_map="auto",
torch_dtype=torch.float16,
)
prompt = "Translate the following English text to concise bullet points:\n\nGPT-NeoX-20B is"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
# Generate (tune generation params as needed)
outputs = model.generate(
input_ids,
max_new_tokens=120,
do_sample=False,
top_k=50,
num_return_sequences=1,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Benchmarks
LAMBADA (zero-shot): 0.720 ± 0.006 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
SciQ (zero-shot): 0.928 ± 0.008 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
PIQA (zero-shot): 0.779 ± 0.010 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
TriviaQA (zero-shot): 0.259 ± 0.004 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
ARC (Challenge, zero-shot): 0.380 ± 0.014 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
Open LLM Leaderboard — Avg.: 36.02 (aggregated selection on HF page) (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
Key Information
- Category: Language Models
- Type: AI Language Models Tool