EleutherAI/gpt-neox-20b - AI Language Models Tool
Overview
GPT-NeoX-20B is an open-source, 20-billion-parameter autoregressive transformer language model released by EleutherAI for research and experimentation. The model was trained on The Pile (an 825 GiB English-language compilation) and was designed to be architecturally similar to GPT‑3 while leveraging the GPT‑NeoX training library and modern improvements such as rotary positional embeddings. The model card and accompanying paper document training procedure, architecture choices, and a range of zero‑ and few‑shot evaluations that position GPT‑NeoX‑20B as a strong public baseline for research into large language models. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b)) GPT‑NeoX‑20B is distributed under the Apache‑2.0 license and shipped in formats compatible with the Hugging Face Transformers ecosystem for inference and further fine‑tuning. It is intended primarily as a research artifact (not a production-ready chatbot) and the maintainers recommend careful risk/bias assessment before deployment. The model is sizable (downloads and community forks are common) and requires substantial compute to run or fine‑tune; community threads and forks provide quantized and alternate packaging for smaller‑footprint inference. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b))
Model Statistics
- Downloads: 198,448
- Likes: 579
- Pipeline: text-generation
License: apache-2.0
Model Details
Architecture and size: GPT‑NeoX‑20B is an autoregressive transformer with approximately 20.55 billion parameters implemented in the GPT‑NeoX training library. Its published hyperparameters include 44 layers, model dimension 6144, 64 attention heads (head dim 96), a vocabulary of 50,257 tokens, and a context window of 2048 tokens. Rotary Position Embeddings (RoPE) are used for positional information. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b)) Training and data: The model was trained on The Pile, an 825 GiB curated English dataset that mixes academic writing, web crawl material, books, dialogue, and code. Training used tensor and pipeline parallelism with a batch size of ~3.15M tokens (about 1538 sequences of length 2048) for 150,000 steps; full details are in the GPT‑NeoX‑20B paper. Users should note that The Pile contains unfiltered material (including offensive text) and that the dataset was not deduplicated before training. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b)) Implementation and tooling: GPT‑NeoX‑20B was produced with EleutherAI's GPT‑NeoX codebase, which integrates Megatron‑style model parallelism and DeepSpeed optimizations (ZeRO, 3D parallelism). The GitHub repository documents training, evaluation, generation, and conversion routes for exporting checkpoints into Transformers-compatible formats for inference and downstream fine‑tuning. Community forks and quantized variants (including GGUF/quantized weights) are available to reduce inference cost. ([github.com](https://github.com/EleutherAI/gpt-neox?utm_source=openai)) Limitations and intended use: The model is expressly released for research. It is not RLHF‑tuned for safe, human‑facing chat and may produce factually incorrect or socially unacceptable outputs. EleutherAI recommends curating outputs and performing independent bias and risk assessments before any downstream application. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b))
Key Features
- Approximately 20.55 billion parameters in an autoregressive transformer architecture.
- 44 transformer layers, d_model=6144, 64 attention heads, 2048 token context window.
- Trained on The Pile (825 GiB) for broad English-language pretraining.
- Uses Rotary Position Embeddings (RoPE) for better long‑context handling.
- Open-source Apache‑2.0 license; convertible to Hugging Face Transformers format.
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Basic loading example (may require a machine with enough RAM/GPU memory)
model_id = "EleutherAI/gpt-neox-20b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = "Write a concise summary of the causes of the French Revolution:\n"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate (CPU/GPU behavior depends on your environment)
outputs = model.generate(**inputs, max_new_tokens=120, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Notes: For large models you will likely need device_map/accelerate/DeepSpeed or a quantized build for practical inference. Benchmarks
Zero-shot LAMBADA (accuracy): 0.720 ± 0.006 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
Zero-shot PIQA (accuracy): 0.779 ± 0.010 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
Open LLM Leaderboard — HellaSwag (10-shot): 73.45 (avg score shown on model card) (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
Open LLM Leaderboard — MMLU (5-shot): 25.0 (reported on model card) (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
Downloads (last month, Hugging Face page): 198,448 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)
Key Information
- Category: Language Models
- Type: AI Language Models Tool