Divergence signatures across model families — same prompt, one-word perturbation

x = generated token index · y = normalized token edit distance (mean over prompt pairs). All models emit answers directly (Qwen 4B/9B use thinkoff variants).
Qwen ladder (0.8B, 2B, 4B-thinkoff, 9B-thinkoff)
Non-reasoning instruct (Mistral 7B, OLMo-3 7B, Granite 3.3 8B, Gemma 4 E4B it, Falcon 3 10B)
Legacy (GPT-2 XL, GPT-J 6B, OPT 6.7B, Pythia 6.9B, LLaMA-1 7B)