FuseChat-7B-VaRM - AI Language Models Tool
Overview
FuseChat-7B-VaRM is a memory-efficient 7B-class chat model from FuseAI that fuses knowledge from three diverse chat LLMs (NH2-Mixtral-8x7B, NH2-Solar-10.7B, OpenChat-3.5-7B) using a two-stage fuse-then-merge pipeline. The model was released Feb 26, 2024 and is presented as part of the FuseChat research project; the authors report that FuseChat-7B-VaRM achieves an average MT-Bench score of 8.22, placing it among top open-source 7B chat models. ([huggingface.co](https://huggingface.co/FuseAI/FuseChat-7B-VaRM)) ([arxiv.org](https://arxiv.org/abs/2402.16107?utm_source=openai)) Instead of ensembling multiple experts at inference time (which increases memory use), FuseChat performs lightweight pairwise knowledge fusion to convert source LLMs into target models with a shared architecture, then merges those targets in parameter space using an algorithm called VaRM (Variation Ratio Merging). This produces a single model that aims to integrate complementary strengths (e.g., instruction-following, conversational style, and factual recall) without increasing runtime memory footprint. The model is released under the Apache-2.0 license and includes code, tokenization templates, and the FuseChat-Mixture training corpus for reproducibility. ([huggingface.co](https://huggingface.co/FuseAI/FuseChat-7B-VaRM))
Model Statistics
- Downloads: 134
- Likes: 88
- Pipeline: text-generation
- Parameters: 7.2B
License: apache-2.0
Model Details
Architecture and approach: FuseChat-7B-VaRM is implemented as a causal chat LLM (text-generation pipeline) of roughly 7B parameters (metadata/third-party references list ~7.24B as a more precise parameter count). The core technical pipeline has two stages: (1) pairwise knowledge fusion — lightweight continual fine-tuning of each source LLM onto a common target architecture and token/template conventions; (2) parameter-space merging — combining target model weights using VaRM, a statistics-driven merging coefficient computed from the variation ratio of parameter matrices before vs. after fine-tuning. The procedure includes tokenization/chat-template alignment so that conversation-role markers and multi-turn contexts are consistent across sources. ([arxiv.org](https://arxiv.org/abs/2402.16107?utm_source=openai)) Capabilities and runtime: The fused model is intended for multi-turn chat, instruction-following, and conversational generation with improved aggregated knowledge over its sources while keeping inference memory roughly that of a single 7B model. The Hugging Face model card and project repo provide ready-to-run examples (transformers tokenizer + chat template helpers) and scripts for reproducing the pairwise fusion and merging workflow. Implementation artifacts, dataset (FuseChat-Mixture) and training/merge scripts are available in the project repository. ([huggingface.co](https://huggingface.co/FuseAI/FuseChat-7B-VaRM))
Key Features
- Fuse-then-merge pipeline: pairwise knowledge fusion followed by parameter-space merging.
- VaRM merging: weight coefficients based on variation ratios of parameter updates.
- Memory-efficient inference: single 7B model, no runtime expert ensemble required.
- Plug-and-play targets: released target models (OpenChat-3.5-7B-Solar/Mixtral) for integration.
- Reproducible training scripts and FuseChat-Mixture dataset included in repo.
- Competitive MT-Bench performance among open-source 7B chat models (reported 8.22).
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load tokenizer and model from Hugging Face
model_id = "FuseAI/FuseChat-7B-VaRM"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
# Example multi-turn chat using the provided chat template helper (as in the model card)
messages = [
{"role": "user", "content": "Hello, who won the 2024 Super Bowl?"}
]
# Some FuseChat tokenizers expose `apply_chat_template`; if not available, build prompt manually
if hasattr(tokenizer, "apply_chat_template"):
tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
input_ids = tokens.input_ids if hasattr(tokens, 'input_ids') else tokens
else:
# Fallback: simple single-turn prompt
prompt = "User: Hello, who won the 2024 Super Bowl?\nAssistant:"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
# Generate (adjust generation params as needed)
outputs = model.generate(input_ids, max_new_tokens=200, do_sample=False)
reply = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(reply)
# Note: see the FuseChat model card for specialized tokenizer/chat-template usage and examples. https://huggingface.co/FuseAI/FuseChat-7B-VaRM Benchmarks
MT-Bench (average): 8.22 (Source: https://huggingface.co/FuseAI/FuseChat-7B-VaRM)
Reported parameter count: Approximately 7B (metadata/third-party listings ~7.24B) (Source: https://promptlayer.com/models/fusechat-7b-varm and https://huggingface.co/FuseAI/FuseChat-7B-VaRM)
License: Apache-2.0 (Source: https://huggingface.co/FuseAI/FuseChat-7B-VaRM)
Key Information
- Category: Language Models
- Type: AI Language Models Tool