FuseChat-7B-VaRM - AI Language Models Tool

Overview

FuseChat-7B-VaRM is a memory-efficient fused chat LLM from FuseAI that combines knowledge from three source chat models (NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B) using a two-stage “fuse-then-merge” pipeline. The released 7B-class model targets instruction-following and multi-domain chat, and reports strong results on MT-Bench (average 8.22), placing it ahead of many open and closed 7B+ models while approaching much larger models on several benchmarks. ([huggingface.co](https://huggingface.co/FuseAI/FuseChat-7B-VaRM?utm_source=openai)) Technically, FuseChat emphasizes lightweight continual fine-tuning (pairwise knowledge fusion) to convert diverse-source models into common-architecture target models, then merges those targets in parameter space using VaRM — a weighting scheme based on the variation ratio of parameter updates. That approach aims to capture complementary strengths from different model families without requiring multiple experts at inference, keeping runtime memory similar to a single 7B model. Implementation code, training data, and model weights are publicly linked by the authors. ([arxiv.org](https://arxiv.org/abs/2408.07990?utm_source=openai))

Model Statistics

  • Downloads: 322
  • Likes: 88
  • Pipeline: text-generation
  • Parameters: 7.2B

License: apache-2.0

Model Details

Architecture and size: FuseChat-7B-VaRM is a ~7B-parameter chat model (released as a 7B-class checkpoint) intended for text-generation/chat pipelines; released artifacts indicate BF16-compatible weights and a text-generation pipeline tag. ([huggingface.co](https://huggingface.co/FuseAI/FuseChat-7B-VaRM?utm_source=openai)) Core technique: FuseChat applies a two-stage workflow. First, pairwise knowledge fusion uses lightweight fine-tuning and a statistics-based token alignment to transfer knowledge from source LLMs of different architectures into target LLMs of identical structure. Second, the model merging stage blends those target LLMs in parameter space; VaRM (Variation Ratio Merging) computes merging coefficients based on the magnitude of parameter changes before/after fine-tuning, prioritizing weights with larger targeted updates. The authors contrast this with MoE-style approaches: FuseChat merges into a single model so inference memory use remains similar to one model rather than the sum of experts. ([arxiv.org](https://arxiv.org/abs/2408.07990?utm_source=openai)) Provenance and assets: The FuseChat release on Hugging Face documents the fused sources and provides related target checkpoints (e.g., OpenChat-3.5-7B-Solar / -Mixtral) plus alternative merge-method variants (SLERP, TA). The project repository includes training/config scripts and evaluation artifacts. ([huggingface.co](https://huggingface.co/FuseAI/FuseChat-7B-VaRM?utm_source=openai))

Key Features

  • Fuse-then-merge workflow: pairwise fusion then parameter-space merging (VaRM).
  • VaRM merging: weights determined by variation ratio of parameter updates.
  • Integrates diverse model families (Mixtral, Solar, OpenChat) into one checkpoint.
  • Memory-efficient: single-model inference footprint versus MoE ensembles.
  • Released artifacts: target checkpoints, training scripts, and evaluation assets.

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# Load the published FuseChat-7B-VaRM checkpoint from Hugging Face
model_id = "FuseAI/FuseChat-7B-VaRM"

# NOTE: this model uses custom components on the Hub; trust_remote_code=True may be needed.
# Adjust device_map / torch_dtype to match your hardware (cuda, mps, or cpu).

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    device_map="auto",       # requires accelerate / transformers 4.30+ for automatic placement
    torch_dtype="auto"
)

chat = pipeline("text-generation", model=model, tokenizer=tokenizer)

prompt = "Explain quantum entanglement in simple terms, with a short analogy."
output = chat(prompt, max_new_tokens=256, do_sample=False)
print(output[0]["generated_text"])

Benchmarks

MT-Bench (average): 8.22 (Source: https://huggingface.co/FuseAI/FuseChat-7B-VaRM)

MMLU (5-shot, normalized accuracy): 63.71 (Source: https://huggingface.co/FuseAI/FuseChat-7B-VaRM)

AI2 Reasoning Challenge (25-shot, normalized accuracy): 62.88 (Source: https://huggingface.co/FuseAI/FuseChat-7B-VaRM)

HellaSwag (10-shot, normalized accuracy): 84.25 (Source: https://huggingface.co/FuseAI/FuseChat-7B-VaRM)

GSM8k (5-shot accuracy): 63.46 (Source: https://huggingface.co/FuseAI/FuseChat-7B-VaRM)

Last Refreshed: 2026-01-16

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool