vLLM

A high-throughput, memory-efficient library for large language model inference and serving that supports tensor and pipeline parallelism.

Key Information

  • Category: Developer Tools
  • Source: Github
  • Last updated: January 09, 2026

Structured Metrics

No structured metrics captured yet.

Links

Canonical source: https://github.com/vllm-project/vllm