BERT base uncased - AI Language Models Tool
Overview
BERT base uncased is the original 12-layer, 110M-parameter encoder-only Transformer from Google that popularized bidirectional contextual embeddings for NLP. It was pretrained on BookCorpus and English Wikipedia using masked language modeling (MLM) and next-sentence prediction (NSP), and is intended primarily as a feature extractor or a fine-tuning starting point for downstream tasks such as sequence classification, token classification, and question answering. BERT base uncased is available on the Hugging Face Hub under an Apache-2.0 license and is distributed with ready-to-use weights and tokenizer files that work across PyTorch, TensorFlow and JAX ecosystems. (See the model card on Hugging Face and the original BERT paper for training objectives and evaluation highlights.) Because it produces deep bidirectional representations, BERT base uncased is commonly used to fine-tune small-to-medium production models (search query understanding, intent classification, NER, and extractive QA). The model is compact enough (110M parameters) to be fine-tuned on a single GPU for many tasks, and the Transformers ecosystem provides utilities to export the checkpoint to optimized formats like ONNX or Core ML for inference deployment.
Model Statistics
- Downloads: 40,788,491
- Likes: 2534
- Pipeline: fill-mask
- Parameters: 110.1M
License: apache-2.0
Model Details
Architecture and size: BERT base uncased is a 12-layer Transformer encoder with hidden size 768, 12 self-attention heads per layer, and roughly 110 million parameters. The model accepts input sequences up to 512 tokens, uses WordPiece tokenization on lowercased text, and (per the original release) uses a vocabulary on the order of 30k tokens. (Hugging Face model card and Transformers docs summarize these configuration values.) Pretraining objectives and data: the checkpoint was pretrained with two self-supervised objectives—masked language modeling (MLM) and next-sentence prediction (NSP)—on BookCorpus and English Wikipedia. The original pretraining run used Cloud TPUs for roughly one million training steps with mixed sequence lengths (mostly 128, with some 512). The resulting representations are fully bidirectional and designed to be fine-tuned with a small additional head for downstream tasks. (See the Google BERT paper and the official repo for training hyperparameters and procedures.) Frameworks and exports: the Hugging Face Transformers ecosystem provides PyTorch, TensorFlow and JAX-compatible wrappers for the checkpoint and tokenizer; the model can be exported to inference-friendly formats (ONNX, TorchScript, Core ML) using the Transformers export tooling and runtimes for production deployment. The Hugging Face model card also provides ready-made pipeline examples (fill-mask, feature extraction, and task-specific heads). Limitations: the model reflects biases present in the pretraining corpora and retains limitations of the MLM/NSP objectives (e.g., NSP was later removed in some successor recipes). BERT base uncased is encoder-only and not intended for free-form text generation.
Key Features
- Bidirectional Transformer encoder producing contextualized token embeddings.
- Pretrained on BookCorpus and English Wikipedia using MLM + NSP objectives.
- 12 layers, 768 hidden size, 12 attention heads (~110M parameters).
- WordPiece uncased tokenizer (lowercased input, ~30k vocab tokens).
- Ready-to-use in Transformers for PyTorch, TensorFlow, and JAX.
- Exportable to ONNX, TorchScript and Core ML for inference deployment.
- Commonly fine-tuned for classification, token classification, and QA tasks.
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModel, pipeline
# Fill-mask pipeline example (masked token prediction)
unmasker = pipeline('fill-mask', model='bert-base-uncased')
print(unmasker("Hello, I'm a [MASK] model."))
# Feature extraction example (PyTorch)
from transformers import AutoModel
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')
text = "The quick brown fox jumps over the lazy dog"
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs)
# last_hidden_state: (batch_size, seq_len, hidden_size)
last_hidden_state = outputs.last_hidden_state
print('Last hidden state shape:', last_hidden_state.shape)
# Fine-tuning note: use AutoModelForSequenceClassification or other task-specific classes
# e.g. AutoModelForSequenceClassification.from_pretrained('bert-base-uncased') Pricing
Free and open-source — weights and tokenizer distributed under the Apache-2.0 license (no model access fees).
Benchmarks
Model parameters: ~110 million (Source: https://huggingface.co/google-bert/bert-base-uncased)
GLUE (average, reported on model card): 79.6 (GLUE test average, per model card) (Source: https://huggingface.co/google-bert/bert-base-uncased)
SQuAD v1.1 (example BERT-Base dev): ≈88.4 F1 (BERT-Base single-system dev example from official repo) (Source: https://github.com/google-research/bert)
Monthly downloads (Hugging Face): 40,788,491 (downloads last month on Hugging Face) (Source: https://huggingface.co/google-bert/bert-base-uncased)
Original paper headline results (multi-task improvements): GLUE pushed to 80.5; SQuAD v1.1 Test F1 up to 93.2 (reported in paper abstract) (Source: https://arxiv.org/abs/1810.04805)
Key Information
- Category: Language Models
- Type: AI Language Models Tool