Qwen3 - AI Language Models Tool

Overview

Qwen3 is the third-generation open-weight large language model family from the Qwen team at Alibaba Cloud. The series includes dense and Mixture-of-Experts (MoE) variants across a wide size range (0.6B up to a 235B MoE family) and two operational flavors: non-thinking/instruct (fast, direct responses) and thinking (outputs step-by-step reasoning for complex problems). The project is published with open-source checkpoints, model cards and technical reports on the QwenLM GitHub repository and associated documentation. ([github.com](https://github.com/QwenLM/Qwen2/?utm_source=openai)) Qwen3 was designed for stronger reasoning, agent/tool usage, multilingual instruction following, and long-context understanding. The family ships MoE variants that activate a smaller subset of parameters per forward pass (e.g., Qwen3-235B-A22B reports ~235B parameters total with ~22B activated during inference) and new “2507” updates that extend long-context capabilities (256K tokens native for some releases and later optional support to scale up to 1,000,000 tokens). Qwen3 also provides modelcards, deployment examples (Transformers, vLLM, SGLang, TensorRT-LLM), and technical reports for hardware and throughput planning. ([qwen-3.com](https://qwen-3.com/en/?utm_source=openai))

GitHub Statistics

  • Stars: 26,172
  • Forks: 1,844
  • Contributors: 46
  • Primary Language: Python
  • Last Updated: 2026-01-09T03:05:46Z

Key Features

  • Two modes: 'Thinking' (step-by-step chains) and 'Instruct' (direct responses) for workload-adaptive outputs.
  • Mixture-of-Experts (MoE) variants: 235B total with ~22B activated, 30B with ~3B activated.
  • Ultra-long context handling — 2507 updates add 256K native context and optional 1M-token support.
  • Open-weight releases (Apache-2.0) for many sizes and integration guides for HF, vLLM, SGLang, Ollama.
  • Agent/tool integration: designed for function-calling, tool use, and retrieval/agent workflows with reasoning parsers.

Example Usage

Example (python):

from transformers import AutoModelForCausalLM, AutoTokenizer

# Example: load a Qwen3-30B-A3B-Instruct-2507 checkpoint from the Hub
model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"

# load the tokenizer and model (Transformers + HF checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

# Prepare chat-style messages and apply the Qwen chat template
prompt = "Summarize the lifecycle of a Monarch butterfly in three steps."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate (example uses large max_new_tokens for long outputs; tune for your use case)
generated_ids = model.generate(**inputs, max_new_tokens=512)
output_ids = generated_ids[0][len(inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print(content)

# Note: Qwen3 provides additional helpers for thinking-mode parsing and very long-context settings.
# See the Qwen3 repo and modelcards for variant-specific guidance and required framework flags.  # Source: QwenLM/Qwen3 GitHub.

Benchmarks

Model sizes (published): Dense models 0.6B–32B; MoE variants: 30B (3B active) and 235B (22B active). (Source: https://github.com/QwenLM/Qwen3)

Ultra-long context (Qwen3-2507 update): Supports 256K-token long-context (Qwen3-2507); optional extension up to 1,000,000 tokens (announced update). (Source: https://github.com/QwenLM/Qwen3)

235B-A22B architecture (selected specs): ~234B non-embedding params, 94 layers, 128 experts (8 activated), recommended native context examples listed in modelcard. (Source: https://docs.api.nvidia.com/nim/reference/qwen-qwen3-235b-a22b)

AIME’25 (mathematical reasoning) — Qwen3-235B(A-22B): Reported AIME25 score ~81.5% (Qwen3-235B A-22B, per Qwen3 evaluation summaries). (Source: https://newsletter.towardsai.net/p/150-qwen3-impresses-as-a-robust-open)

Multilingual / training scale: Pretraining described as using ~36 trillion tokens across 100+ languages (per Qwen3 documentation/summaries). (Source: https://qwen-3.com/en/)

Last Refreshed: 2026-01-16

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool