EleutherAI/gpt-neox-20b - AI Language Models Tool

Overview

GPT-NeoX-20B is an open-source, 20-billion-parameter autoregressive transformer released by EleutherAI for research and experimentation. It was trained on The Pile (an ~825 GiB English-language corpus) and is distributed under an Apache-2.0 license, making both weights and training code available for replication, fine-tuning, and downstream experiments. The released model and documentation emphasize research uses, few-shot evaluation, and transfer for specialized tasks rather than out-of-the-box deployment as a production chat system. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b)) Technically aligned with GPT-family designs, GPT-NeoX-20B was implemented in the GPT-NeoX training framework (Megatron + DeepSpeed influenced) and made available via Hugging Face Transformers for inference and experimentation. The project provides both slim inference checkpoints and full training checkpoints, extensive training/config files, and community-supported quantized builds and inference recipes. These resources have driven adoption across community Spaces, leaderboards, and forks used for quantized local inference. ([github.com](https://github.com/EleutherAI/gpt-neox))

Model Statistics

  • Downloads: 14,415
  • Likes: 576
  • Pipeline: text-generation
  • Parameters: 20.7B

License: apache-2.0

Model Details

Architecture and size: GPT-NeoX-20B is a dense causal (autoregressive) transformer with approximately 20.55 billion parameters, 44 transformer layers, model dimensionality d_model=6144, 64 attention heads (head dim=96), and a vocabulary of 50,257 tokens. The model uses rotary position embeddings (RoPE) and supports sequence lengths up to 2048 tokens. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b)) Training and engineering: The model was trained on The Pile using fp16 precision with a training regime of roughly 150,000 optimization steps and an effective batch size of ~3.15M tokens (1538 sequences of length 2048). Training used tensor and pipeline parallelism across many GPUs; crowdfunding and cloud support (e.g., CoreWeave) were reported for the large-scale runs. Checkpoints are published (slim inference weights ~39GB; full weights including optimizer states available separately) and the EleutherAI GPT-NeoX repository provides configs, training scripts, and deployment guidance. ([huggingface.co](https://huggingface.co/EleutherAI/gpt-neox-20b)) Tokenizer and runtime: GPT-NeoX-20B uses a tokenizer variant that allocates additional tokens to whitespace characters (useful for code and certain generation tasks). Hugging Face’s Transformers supports loading the model via GPTNeoXForCausalLM/AutoModelForCausalLM, and the docs recommend loading in half-precision (torch.float16) and using attn_implementation optimizations (e.g., Flash Attention 2) for faster, lower-memory inference. Community quantized builds (4-bit/gguf/U8, etc.) are available for local, low-cost inference. ([huggingface.co](https://huggingface.co/docs/transformers/en/model_doc/gpt_neox?utm_source=openai))

Key Features

  • 20+ billion parameters in a dense autoregressive transformer architecture.
  • Trained on The Pile (≈825 GiB English text) for broad-language pretraining.
  • Rotary position embeddings (RoPE) and 2048-token context window.
  • Weights and training code released under Apache-2.0 for research and commercial use.
  • Slim inference weights (~39 GB) and full training checkpoints (optimizer states) available.
  • Compatible with Hugging Face Transformers; supports Flash Attention 2 for faster inference.
  • Demonstrates strong few-shot gains vs. similarly sized models in published evaluations.

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Recommended: load in fp16 and let Transformers place layers on devices automatically
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
model = AutoModelForCausalLM.from_pretrained(
    "EleutherAI/gpt-neox-20b",
    device_map="auto",
    torch_dtype=torch.float16,
)

prompt = "Translate the following English text to concise bullet points:\n\nGPT-NeoX-20B is"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

# Generate (tune generation params as needed)
outputs = model.generate(
    input_ids,
    max_new_tokens=120,
    do_sample=False,
    top_k=50,
    num_return_sequences=1,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Benchmarks

LAMBADA (zero-shot): 0.720 ± 0.006 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)

SciQ (zero-shot): 0.928 ± 0.008 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)

PIQA (zero-shot): 0.779 ± 0.010 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)

TriviaQA (zero-shot): 0.259 ± 0.004 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)

ARC (Challenge, zero-shot): 0.380 ± 0.014 (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)

Open LLM Leaderboard — Avg.: 36.02 (aggregated selection on HF page) (Source: https://huggingface.co/EleutherAI/gpt-neox-20b)

Last Refreshed: 2026-01-16

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool