DeepCONF Custom Generation Strategy

This repository implements the DeepCONF (Deep Confidence-based Early Stopping) generation strategy for Hugging Face Transformers models, following the Deep Think with Confidence approach from the paper Deep Think with Confidence.

Overview

DeepCONF monitors the confidence of generated tokens and stops generation when confidence falls below a threshold.

Parameters

enable_conf (bool): Whether to enable the DeepCONF strategy. Defaults to False.
window_size (int): Size of the sliding window for confidence calculation. Defaults to 2048.
threshold (float): Confidence threshold for early stopping. Defaults to 17.0.
output_confidences (bool): If True and return_dict_in_generate=True, returns a per-step confidence tensor alongside generated sequences for debugging/visualization.

Usage

To use this custom generation strategy, you can pass it directly to the generate method:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("your-model")
tokenizer = AutoTokenizer.from_pretrained("your-model")

inputs = tokenizer("Hello, world!", return_tensors="pt")

# Generate with DeepCONF (Hub repo)
outputs = model.generate(
    **inputs,
    enable_conf=True,
    window_size=2048,
    threshold=17.0,
    output_confidences=True,           # request confidences
    return_dict_in_generate=True,      # required to get tensors
    max_new_tokens=100,
    custom_generate="kashif/DeepConf",  # Hugging Face Hub repo
    trust_remote_code=True
)

Calibration (DeepConf-low/high)

DeepConf’s online stopping threshold is derived from a short warmup phase. You collect warmup trace confidences, then pass them into the generator to auto-derive the threshold for either DeepConf-low (aggressive) or DeepConf-high (permissive).

Warmup (num_return_sequences): collect per-trace confidences (C_t = min(step_confidences))

from transformers import GenerationConfig

prompt = "Explain artificial intelligence."
Ninit = 8  # number of warmup traces
warmup_C = []

warm_cfg = GenerationConfig.from_model_config(model.config)
warm_cfg.do_sample = True
warm_cfg.temperature = 0.7
warm_cfg.top_p = 0.95
warm_cfg.max_new_tokens = 64
warm_cfg.enable_conf = True
warm_cfg.return_dict_in_generate = True
warm_cfg.output_confidences = True
warm_cfg.num_return_sequences = Ninit
# IMPORTANT: Do not set `warm_cfg.threshold` here. Warmup should not apply online early stopping.

out = model.generate(
    **tokenizer(prompt, return_tensors="pt"),
    generation_config=warm_cfg,
    custom_generate="kashif/DeepConf",
    trust_remote_code=True,
)
# Per-trace Ct = min over steps
warmup_C = out.confidences.min(dim=1).values.tolist()

Online: pass warmup confidences to auto-derive threshold

gen_cfg = GenerationConfig.from_model_config(model.config)
gen_cfg.enable_conf = True
gen_cfg.return_dict_in_generate = True
gen_cfg.output_confidences = True

# Choose a variant:
# - DeepConf-low (aggressive): eta=0.1 → 90th percentile threshold
# - DeepConf-high (permissive): eta=0.9 → 10th percentile threshold
gen_cfg.deepconf_variant = "low"  # or "high"
# Optional: override eta explicitly
# gen_cfg.deepconf_eta = 0.1  # defaults: 0.1 for low, 0.9 for high

# Provide warmup confidences; the threshold will be derived internally
gen_cfg.deepconf_warmup_confidences = warmup_C

out = model.generate(
    **tokenizer(prompt, return_tensors="pt"),
    custom_generate="kashif/DeepConf",
    trust_remote_code=True,
    generation_config=gen_cfg,
    max_new_tokens=128,
)

Requirements

PyTorch >= 1.13.0
Transformers >= 4.35.0

Downloads last month: 46

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support