WizardLM - AI Language Models Tool

Overview

WizardLM is an open-source family of instruction-tuned large language models and training methods developed by the WizardLM team (affiliated with Microsoft AI). WizardLM-2 (released April 15, 2024) expands the line with three main variants — WizardLM-2 7B, 70B, and an MoE-style 8x22B — and emphasizes complex-chat, multilingual understanding, reasoning, coding, and agent-style tasks. The project pairs model releases with a research contribution called Auto Evol-Instruct, a fully AI-driven pipeline that automatically designs and optimizes instruction-evolution methods so instruction datasets become progressively more diverse and more challenging. According to the Auto Evol-Instruct paper and the project release notes, this automated pipeline replaces manual heuristics, scales Evol-Instruct across many domains, and leverages an Arena Learning-style setup where multiple models produce and critique high-difficulty training examples. WizardLM’s engineering approach combines staged training (progressive learning), co‑teaching/self‑teaching among multiple models (the team calls this “AI Align AI”), and reinforcement-style preference tuning (Stage‑DPO and RLEIF). The team reports strong internal results on MT-Bench, AlpacaEval, GSM8K and code generation benchmarks, and publishes model weights and code on Hugging Face and GitHub. Core strengths cited by the authors and community: automated instruction evolution, strong multi-domain instruction-following, and specialized variants (e.g., WizardCoder) tuned for code generation. For details see the Auto Evol-Instruct paper (arXiv) and the WizardLM-2 release blog.

Key Features

  • Auto Evol-Instruct: fully AI-driven pipeline to design and optimize instruction evolution methods.
  • Arena Learning: scales evolved instruction generation across many domains for harder training data.
  • Model family: WizardLM-2 variants (7B, 70B, MoE 8x22B) targeting speed, reasoning, or top-tier capability.
  • Code-specialist variants (WizardCoder) fine-tuned with evolved code instructions for strong code generation.
  • Open-source releases: weights and code published on Hugging Face and GitHub (Apache 2.0 / community licenses).

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM

def generate_prompt_response(model_id, prompt, max_new_tokens=256):
    # Many WizardLM models release code/weights on Hugging Face; large/MoE models
    # may require specialized runtimes (vLLM, text-generation-inference) or
    # trust_remote_code=True for repo-specific model code.
    tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map='auto',
        torch_dtype='auto',
        trust_remote_code=True  # required for some community repos
    )

    inputs = tokenizer(prompt, return_tensors='pt')
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

if __name__ == '__main__':
    # Example: replace with a small WizardLM model id if available locally
    model_id = 'KnutJaegersberg/WizardLM-2-8x22B'
    prompt = (
        'A chat between a curious user and an artificial intelligence assistant. '
        'The assistant gives helpful, detailed, and polite answers.\n'
        'USER: Explain Auto Evol-Instruct in plain language.\nASSISTANT:'
    )
    print(generate_prompt_response(model_id, prompt))

Benchmarks

MT-Bench (reported): 8.09 (Mixtral-8x7B fine-tuned on 10K evolved ShareGPT) (Source: https://arxiv.org/pdf/2406.00770)

AlpacaEval (reported): 91.4 (Mixtral-8x7B fine-tuned on 10K evolved ShareGPT) (Source: https://arxiv.org/pdf/2406.00770)

GSM8K (reported): 82.49 (Mixtral-8x7B fine-tuned on 7K evolved GSM8K) (Source: https://arxiv.org/pdf/2406.00770)

HumanEval (WizardCoder): 57.3 pass@1 (WizardCoder-15B-v1.0, reported) (Source: https://huggingface.co/WizardLMTeam/WizardCoder-15B-V1.0)

Model size (WizardLM-2 8x22B / MoE): 141B effective parameters (Mixture-of-Experts family representation) (Source: https://huggingface.co/KnutJaegersberg/WizardLM-2-8x22B)

Last Refreshed: 2026-01-16

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool