BERT base uncased - AI Language Models Tool

Overview

BERT base uncased is the original 110M-parameter BERT-Base encoder-only Transformer pretrained by Google on BookCorpus and English Wikipedia using Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). It provides bidirectional contextual token and sentence representations that serve as a drop-in backbone for classification, token-level tagging (NER), and extractive question answering tasks, and is distributed on the Hugging Face Hub under the Apache-2.0 license. ([huggingface.co](https://huggingface.co/google-bert/bert-base-uncased)) Because it is an encoder-only Transformer, BERT base uncased is typically fine-tuned (one additional task-specific head) for downstream tasks; the model is available across the Transformers ecosystem (PyTorch, TensorFlow, JAX) and is exportable to runtime formats like ONNX and Core ML for production deployment. The Hugging Face model card documents example uses (fill-mask pipeline, embedding extraction) and shows extensive community adoption and derivatives (thousands of fine-tunes and adapters). ([huggingface.co](https://huggingface.co/google-bert/bert-base-uncased))

Model Statistics

  • Downloads: 53,047,036
  • Likes: 2569
  • Pipeline: fill-mask
  • Parameters: 110.1M

License: apache-2.0

Model Details

Architecture: BERT-Base (uncased) uses a 12-layer Transformer encoder with hidden size 768, 12 self-attention heads, and a 3072-d feed-forward intermediate (≈110 million parameters). Input tokens use WordPiece tokenization with a 30,000 token vocabulary and a max sequence length of 512. The released checkpoints and config files are documented in the original Google repo. ([github.com](https://github.com/google-research/bert?utm_source=openai)) Pretraining: trained on BookCorpus and English Wikipedia with two self-supervised objectives — masked language modeling (15% masking scheme) and next-sentence prediction — run for ~1M steps on TPU pods (training recipe described in the paper and repository). This makes the model deeply bidirectional and effective across sentence-level, span-level, and token-level tasks. ([arxiv.org](https://arxiv.org/abs/1810.04805)) Capabilities & integrations: usable via Hugging Face Transformers pipelines (fill-mask, feature-extraction, question-answering, sequence-classification) in PyTorch/TF/JAX; exportable to ONNX using transformers.onnx and convertible to Core ML with coremltools workflows for mobile deployment. The model card also lists community-provided Core ML conversions and quantized/merged variants. ([huggingface.co](https://huggingface.co/google-bert/bert-base-uncased))

Key Features

  • 110M-parameter BERT-Base: 12 layers, 768 hidden, 12 attention heads.
  • Pretrained with MLM and NSP on BookCorpus and English Wikipedia.
  • Supports fill-mask, feature-extraction, QA, classification pipelines in Transformers.
  • Available in PyTorch, TensorFlow, and JAX; exportable to ONNX and Core ML.
  • Apache-2.0 license and large community adoption (thousands of fine-tunes and adapters).

Example Usage

Example (python):

from transformers import pipeline, BertTokenizer, BertModel

# 1) Simple fill-mask pipeline (quick demo)
unmasker = pipeline('fill-mask', model='bert-base-uncased')
print(unmasker("Hello, I'm a [MASK] model."))

# 2) Extract embeddings (PyTorch)
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')
text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs)
# last_hidden_state: (batch_size, seq_len, hidden_size)
embeddings = outputs.last_hidden_state
print(embeddings.shape)

# 3) Export to ONNX (command-line; requires transformers[onnx])
# python -m transformers.onnx --model=bert-base-uncased onnx/bert-base-uncased/

# Note: for Core ML conversion, export a TF or PyTorch checkpoint and use coremltools conversion guides.

Benchmarks

Parameters: 110M (Source: https://github.com/google-research/bert and https://huggingface.co/google-bert/bert-base-uncased)

GLUE average (fine-tuned): 79.6 (per-model card task table) (Source: https://huggingface.co/google-bert/bert-base-uncased)

SQuAD v1.1 Test F1 (paper): 93.2 (reported in original paper) (Source: https://arxiv.org/abs/1810.04805)

SQuAD v2.0 Test F1 (paper): 83.1 (reported in original paper) (Source: https://arxiv.org/abs/1810.04805)

Hugging Face downloads (last month): 53,047,036 (Source: https://huggingface.co/google-bert/bert-base-uncased)

Last Refreshed: 2026-02-24

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool