Qwen - AI Model Hubs Tool
Overview
Qwen is the family of large language and multimodal models developed by Alibaba Cloud’s Qwen team and published/hosted across Hugging Face and Alibaba Cloud Model Studio. The Qwen2.5 lineup spans many sizes (0.5B–72B parameters) and includes specialized variants — Qwen2.5-Coder for code, Qwen2.5-Math for advanced mathematics, and multimodal Qwen2.5-VL / Qwen-Omni — enabling text generation, image–text reasoning, audio/video understanding, and long-form generation. ([qwen2.org](https://qwen2.org/qwen2-5/?utm_source=openai)) The project emphasizes large pretraining corpora, long-context capabilities, and permissive open-weight releases for many sizes. Several Qwen2.5 models natively support very large context windows (mid/large variants up to 128K tokens) and generation lengths up to ~8K tokens, making them suitable for long documents, codebases, and multi-file agents. Alibaba Cloud also offers hosted commercial endpoints (Qwen-Max/Qwen-Plus family) via Model Studio with token-based billing if you prefer managed inference over self-hosting. ([huggingface.co](https://huggingface.co/Qwen/Qwen-72B?utm_source=openai)) Community feedback since the Qwen2.5 rollout has been broadly positive on benchmarks and multimodal features, while users report practical issues around tokenizer templates, quantization/hosting tooling, and prompt templates when integrating some versions locally. Experimental GGUF/quantized builds and Hugging Face/transformers integration enable both local and cloud deployments. ([alibabacloud.com](https://www.alibabacloud.com/blog/601782?utm_source=openai))
Key Features
- Open-weight family spanning 0.5B to 72B parameters for different latency/accuracy trade-offs.
- Long-context support: mid/large variants can handle up to ~128K tokens in practice.
- Multimodal VL and Omni models for image–text, audio, and video understanding and generation.
- Specialized Coder and Math variants trained on large code/math corpora for higher accuracy.
- Quantized GGUF builds and transformer integrations enable local, low-memory deployments.
- Hosted commercial endpoints (Qwen-Max/Qwen-Plus) on Alibaba Cloud Model Studio with token billing.
Example Usage
Example (python):
from transformers import pipeline
# Example: load an instruction-tuned Qwen2.5 model from Hugging Face
# (adjust model name to the exact repo you want, and ensure you have the required transformers version)
generator = pipeline(
"text-generation",
model="Qwen/Qwen2.5-14B-Instruct",
device=0, # set to -1 for CPU
trust_remote_code=True # required for some Qwen repos
)
prompt = "Summarize the following product brief in two sentences:\n\nProduct: Qwen2.5 — long-context model optimized for code and math."
output = generator([{"role": "user", "content": prompt}], max_new_tokens=150, return_full_text=False)
print(output[0]["generated_text"]) Pricing
Alibaba Cloud Model Studio provides hosted Qwen commercial endpoints (Qwen-Max, Qwen-Plus and Qwen2.5 variants) on a per-token billing model. Example published rates (Model Studio docs): Qwen2.5-14B-Instruct sample rates listed as roughly $0.144 per million input tokens and $0.431 per million output tokens for certain instruct-tier offerings; Qwen-Max / larger tiers use higher per-million-token rates and tiered pricing for large-context requests. Pricing, free quotas, and regional deployment modes vary by model and date—see Alibaba Cloud Model Studio docs for current, region-specific rates and discounts. Source: Alibaba Cloud Model Studio pricing documentation. (If you need precise live pricing for your account/region, confirm on Alibaba Cloud Model Studio.)
Benchmarks
MMLU (Qwen2.5-72B, instruction-tuned): 86.1% (Source: https://qwen2.org/qwen2-5/)
GSM8K (Qwen2.5-72B): 91.5% (Source: https://qwen2.org/qwen2-5/)
HumanEval (Qwen2.5-72B, pass@1-like metric shown): 59.1 (reported table value) (Source: https://qwen2.org/qwen2-5/)
Context window (mid/large Qwen2.5 variants): Up to 128K tokens (support for very-long context / YaRN extrapolation noted) (Source: https://qwen2.org/qwen2-5/)
Pretraining corpus scale (reported): Trillions of tokens (models report up to ~2.4–3+T tokens for various releases; Qwen2.5 mentions very large corpora) (Source: https://huggingface.co/Qwen/Qwen-72B)
Key Information
- Category: Model Hubs
- Type: AI Model Hubs Tool