watt-tool-70B - AI Language Models Tool
Overview
watt-tool-70B is a 70–71B parameter instruction-following LLM fine-tuned from Meta LLaMa-3.x (listed as meta-llama/Llama-3.3-70B-Instruct) and optimized specifically for tool usage, function calling, and multi-turn agent workflows. The model card and README emphasize supervised fine-tuning plus “Direct Multi-Turn Preference Optimization” (DMPO) and chain-of-thought style data to improve multi-turn tool orchestration and function-selection decisions, making the model suitable for workflow builders and agent platforms such as Lupan and Coze. ([huggingface.co](https://huggingface.co/watt-ai/watt-tool-70B)) Practically, watt-tool-70B is intended to produce structured function calls (tool selection + arguments), to withhold calls when none are appropriate, and to maintain conversational context across multiple turns. The project is published under an Apache-2.0 license and is distributed in BF16 weights on Hugging Face, with community-maintained quantized variants (GGUF / GPTQ) available for local/edge use. The authors and community posts also note that watt-tool-70B achieved top performance on the Berkeley Function-Calling Leaderboard (BFCL) at times, though leaderboard rankings have continued to evolve. ([huggingface.co](https://huggingface.co/watt-ai/watt-tool-70B))
Model Statistics
- Downloads: 95,770
- Likes: 118
- Parameters: 70.6B
License: apache-2.0
Model Details
Architecture & lineage: fine-tuned from meta-llama/Llama-3.3-70B-Instruct (LLaMa-3 family) into a 70B-class causal language model specialized for function-calling and tool use. The Hugging Face model card lists the model as ~71B parameters and BF16 tensor weights. ([huggingface.co](https://huggingface.co/watt-ai/watt-tool-70B)) Training and optimization: the maintainers report Supervised Fine-Tuning (SFT) on curated multi-turn tool-use data, use of Chain-of-Thought (CoT) style synthesis, and Direct Multi-Turn Preference Optimization (DMPO) to improve multi-step agent decisions. The model card links to an associated paper describing the DMPO approach (arXiv:2406.14868). ([huggingface.co](https://huggingface.co/watt-ai/watt-tool-70B)) Capabilities and intended use: optimized for accurate function selection, multi-step tool orchestration, and retaining context across dialog turns. Example target applications include automated workflow builders, orchestration layers that must call multiple APIs, multi-turn data-extraction pipelines, and research into agentic LLM behavior. The model outputs structured function-call text when provided with function definitions in prompts and is designed to avoid spurious calls when no tool is appropriate. ([huggingface.co](https://huggingface.co/watt-ai/watt-tool-70B)) Formats & deployment: official weights are distributed via Hugging Face (safetensors, BF16). Community-maintained quantized builds (GGUF, GPTQ) exist for local inference (see several community repos for Q3/Q4/GGUF variants). The model card notes it is not currently deployed by an inference provider (so hosted API pricing is provider-dependent). ([huggingface.co](https://huggingface.co/watt-ai/watt-tool-70B))
Key Features
- Fine-tuned for accurate function/tool selection in multi-step workflows.
- Optimized multi-turn dialogue: retains context across chained tool calls.
- Structured function-call outputs compatible with JSON-like tool definitions.
- Trained with SFT and Direct Multi-Turn Preference Optimization (DMPO).
- Distributed in BF16 safetensors with community GGUF/GPTQ quantizations.
- Apache-2.0 licensed for permissive reuse and commercial integration.
Example Usage
Example (python):
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "watt-ai/watt-tool-70B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
# Example: provide system+user messages and a JSON-style list of functions in the prompt
system = "You are an assistant that returns function calls when appropriate. Only emit valid calls when necessary."
user = "Get me the latest weather for San Francisco and schedule a meeting tomorrow at 10am if storm expected."
functions = "[{\"name\": \"weather_forecast\", \"args\": {\"location\": \"San Francisco, CA\", \"days\": 1}}, {\"name\": \"calendar.create_event\", \"args\": {\"title\": \"Meeting about weather\", \"time\": \"2026-03-04T10:00:00\"}}]"
prompt = f"System: {system}\nUser: {user}\nFunctions: {functions}\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Note: the Hugging Face model card includes usage examples and a chat-template example; adapt tokenization/generation settings for your runtime. ([huggingface.co](https://huggingface.co/watt-ai/watt-tool-70B)) Benchmarks
Berkeley Function-Calling Leaderboard (BFCL): Model card reports state-of-the-art/top performance on BFCL (leaderboard rankings have since evolved) (Source: https://huggingface.co/watt-ai/watt-tool-70B)
Hugging Face downloads (last month): 95,770 (downloads last month, listed on model page) (Source: https://huggingface.co/watt-ai/watt-tool-70B)
Hugging Face likes / community stars: 118 likes on Hugging Face (Source: https://huggingface.co/watt-ai/watt-tool-70B)
Model size (parameters): ≈71B parameters (model card) (Source: https://huggingface.co/watt-ai/watt-tool-70B)
Quantized community builds: Multiple GGUF/GPTQ quantizations available (Q3/Q4 variants) for local inference (Source: https://huggingface.co/bartowski/watt-ai_watt-tool-70B-GGUF)
Key Information
- Category: Language Models
- Type: AI Language Models Tool