DeepCONF Custom Generation Strategy
This repository implements the DeepCONF (Deep Confidence-based Early Stopping) generation strategy for Hugging Face Transformers models, following the Deep Think with Confidence approach from the paper Deep Think with Confidence.
Overview
DeepCONF monitors the confidence of generated tokens and stops generation when confidence falls below a threshold.
Parameters
enable_conf
(bool): Whether to enable the DeepCONF strategy. Defaults toFalse
.window_size
(int): Size of the sliding window for confidence calculation. Defaults to2048
.threshold
(float): Confidence threshold for early stopping. Defaults to17.0
.output_confidences
(bool): IfTrue
andreturn_dict_in_generate=True
, returns a per-step confidence tensor alongside generated sequences for debugging/visualization.
Usage
To use this custom generation strategy, you can pass it directly to the generate
method:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("your-model")
tokenizer = AutoTokenizer.from_pretrained("your-model")
inputs = tokenizer("Hello, world!", return_tensors="pt")
# Generate with DeepCONF (Hub repo)
outputs = model.generate(
**inputs,
enable_conf=True,
window_size=2048,
threshold=17.0,
output_confidences=True, # request confidences
return_dict_in_generate=True, # required to get tensors
max_new_tokens=100,
custom_generate="kashif/DeepConf", # Hugging Face Hub repo
trust_remote_code=True
)
Calibration (DeepConf-low/high)
DeepConf’s online stopping threshold is derived from a short warmup phase. You collect warmup trace confidences, then pass them into the generator to auto-derive the threshold for either DeepConf-low (aggressive) or DeepConf-high (permissive).
- Warmup (num_return_sequences): collect per-trace confidences (C_t = min(step_confidences))
from transformers import GenerationConfig
prompt = "Explain artificial intelligence."
Ninit = 8 # number of warmup traces
warmup_C = []
warm_cfg = GenerationConfig.from_model_config(model.config)
warm_cfg.do_sample = True
warm_cfg.temperature = 0.7
warm_cfg.top_p = 0.95
warm_cfg.max_new_tokens = 64
warm_cfg.enable_conf = True
warm_cfg.return_dict_in_generate = True
warm_cfg.output_confidences = True
warm_cfg.num_return_sequences = Ninit
# IMPORTANT: Do not set `warm_cfg.threshold` here. Warmup should not apply online early stopping.
out = model.generate(
**tokenizer(prompt, return_tensors="pt"),
generation_config=warm_cfg,
custom_generate="kashif/DeepConf",
trust_remote_code=True,
)
# Per-trace Ct = min over steps
warmup_C = out.confidences.min(dim=1).values.tolist()
- Online: pass warmup confidences to auto-derive threshold
gen_cfg = GenerationConfig.from_model_config(model.config)
gen_cfg.enable_conf = True
gen_cfg.return_dict_in_generate = True
gen_cfg.output_confidences = True
# Choose a variant:
# - DeepConf-low (aggressive): eta=0.1 → 90th percentile threshold
# - DeepConf-high (permissive): eta=0.9 → 10th percentile threshold
gen_cfg.deepconf_variant = "low" # or "high"
# Optional: override eta explicitly
# gen_cfg.deepconf_eta = 0.1 # defaults: 0.1 for low, 0.9 for high
# Provide warmup confidences; the threshold will be derived internally
gen_cfg.deepconf_warmup_confidences = warmup_C
out = model.generate(
**tokenizer(prompt, return_tensors="pt"),
custom_generate="kashif/DeepConf",
trust_remote_code=True,
generation_config=gen_cfg,
max_new_tokens=128,
)
Requirements
- PyTorch >= 1.13.0
- Transformers >= 4.35.0
- Downloads last month
- 46
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support