DeepCoder-14B-Preview - AI Language Models Tool

Overview

DeepCoder-14B-Preview is a 14–15B-parameter code-reasoning LLM fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-14B to improve long-context code generation and program reasoning. The preview release was trained with a distributed reinforcement learning recipe (GRPO+) plus iterative context lengthening and is published under the MIT license on Hugging Face. According to the model card, DeepCoder achieves 60.6% Pass@1 on a recent LiveCodeBench v5 split and is described as achieving parity with larger proprietary models on several coding benchmarks while remaining fully open-source. ([huggingface.co](https://huggingface.co/agentica-org/DeepCoder-14B-Preview?utm_source=openai)) The authors curated ~24k unique problem-test pairs (TACO-Verified, PrimeIntellect SYNTHETIC-1, LiveCodeBench subsets) and applied training optimizations—offline difficulty filtering, overlong-loss masking, and removal of KL/entropy losses—to stabilize RL fine-tuning and scale long-context generalization. The model generalizes from its 32K training context to practical inference at 64K context lengths, making it suited for multi-file projects, long notebooks, and complex algorithmic reasoning tasks. Usage recommendations on the model page include temperature=0.6, top_p=0.95 and running with a large max_tokens budget for best results. ([huggingface.co](https://huggingface.co/agentica-org/DeepCoder-14B-Preview?utm_source=openai))

Model Statistics

  • Downloads: 356
  • Likes: 680
  • Pipeline: text-generation

License: mit

Model Details

Base and size: DeepCoder-14B-Preview is built on DeepSeek-R1-Distill-Qwen-14B (distilled R1/Qwen foundation) and the Hugging Face model page lists model storage as ~15B parameters. ([huggingface.co](https://huggingface.co/agentica-org/DeepCoder-14B-Preview?utm_source=openai)) Training recipe: The model was fine-tuned using an enhanced Group Relative Policy Optimization (GRPO+) variant informed by DAPO ideas. Key changes include offline difficulty filtering of training examples, removal of KL and entropy losses to reduce training instability and compute, “clip high” adjustments to the surrogate loss to encourage exploration, and overlong filtering (masking loss for truncated outputs) to preserve long-context reasoning. Iterative context lengthening was used (16K→32K during training) which enabled the model to generalize to 64K-context inference. ([huggingface.co](https://huggingface.co/agentica-org/DeepCoder-14B-Preview?utm_source=openai)) Data and evaluation: The curated training corpus is described as ~24k verified coding problems drawn from TACO-Verified, PrimeIntellect SYNTHETIC-1, and LiveCodeBench subsets. Evaluations reported on the model card include LiveCodeBench v5, Codeforces-style rating and percentile, and HumanEval+ scores. Serving compatibility noted by the authors includes vLLM, Hugging Face Text Generation Inference (TGI), SGLang and TensorRT-LLM (OpenAI Chat Completions API format supported). ([huggingface.co](https://huggingface.co/agentica-org/DeepCoder-14B-Preview?utm_source=openai))

Key Features

  • Fine-tuned for coding and program reasoning tasks with reinforcement learning (GRPO+).
  • Iterative context lengthening enables strong generalization to 64K-context inference.
  • Curated training set of ~24K verified problem–test pairs (TACO, SYNTHETIC-1, LCB subsets).
  • Competitive LiveCodeBench performance (60.6% Pass@1) while remaining open-source (MIT).
  • Compatible with vLLM, Hugging Face TGI, TensorRT-LLM for high-performance serving.

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch

model_id = "agentica-org/DeepCoder-14B-Preview"

# Load tokenizer and model (adjust device_map / dtype for your hardware)
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)

prompt = """
# Problem: Write a Python function `is_prime(n)` that returns True for prime n, False otherwise.
# Provide unit tests after the function.
"""

gen_cfg = GenerationConfig(
    temperature=0.6,    # recommended by model authors
    top_p=0.95,         # recommended by model authors
    max_new_tokens=512, # adapt for longer outputs; model page recommends large max_tokens for long contexts
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, generation_config=gen_cfg)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Notes:
# - The model card recommends avoiding a separate system prompt and placing all instructions inside the user prompt.
# - For very long-project generation, authors recommend running inference with a very large token budget (model card cites generalization to 64K context when supported by the runtime). See the model page for serving guidance. ([huggingface.co](https://huggingface.co/agentica-org/DeepCoder-14B-Preview?utm_source=openai))

Benchmarks

LiveCodeBench v5 Pass@1 (8/1/24–2/1/25): 60.6% (Source: https://huggingface.co/agentica-org/DeepCoder-14B-Preview (model card))

DeepSeek-R1-Distill-Qwen-14B Pass@1 (comparison): 53.0% (Source: https://huggingface.co/agentica-org/DeepCoder-14B-Preview (model card))

Codeforces-style rating (reported): 1936 (95.3 percentile) (Source: https://huggingface.co/agentica-org/DeepCoder-14B-Preview (model card))

HumanEval+ (reported): 92.6% (Source: https://huggingface.co/agentica-org/DeepCoder-14B-Preview (model card))

Relative parity vs proprietary models (o3-mini low / o1 low): Comparable (DeepCoder 60.6% vs o3-mini 60.9% on LCBv5 split) (Source: DeepLearning.AI coverage summarizing model card claims)

Last Refreshed: 2026-02-24

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool