HDLM-Epsilon: Hybrid Diffusion Language Model
This model card is for the hdlm-base model with epsilon=0.01
Model Description
HDLM-Epsilon is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through epsilon-hybrid noising. This model interpolates evolution operators between absorbing and uniform processes, making it conceptually closer to MDLM (Sahoo et al. 2024) while maintaining the benefits of both paradigms.
The epsilon parameter (ε) controls the blend between absorbing and uniform processes during training, where smaller values emphasize the absorbing process and larger values incorporate more uniform noise.
Model Architecture
- Base Model: Transformer architecture with custom conditioning layers
- Vocabulary Size: 50,258 tokens (GPT-2 vocabulary + absorbing token)
- Context Length: 1024 tokens
- Training: Hybrid loss combining token masking with random token corruption
- Inference: Supports multiple sampling algorithms including ACS (Adaptive Correction Sampler)
Usage
Quick Start
from hdlm.hf_utils import smart_model_loader
from hdlm.epsilon_hybrid.sample import full_diff
from transformers import GPT2TokenizerFast
import torch
# Load model using smart loader (automatically detects model type)
model, cfg, device, accelerator, metaschedule = smart_model_loader(
model_path="hdlm-group/hdlm-base-epsilon-0.01",
model_type="auto", # automatically detects epsilon_hybrid
device="cuda"
)
# Load tokenizer
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
# Generate text
prompt = "The future of artificial intelligence"
prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
# Full diffusion sampling
generated = full_diff(
model=model,
prompt=prompt_ids,
batch_size=1,
alg='acs', # or 'original', 'remask', 'remdm'
steps=512,
temperature=1.0,
context_length=1024,
device=device
)
# Decode generated text
generated_text = tokenizer.decode(generated[0], skip_special_tokens=True)
print(generated_text)
Evaluation
# Text generation evaluation
python hdlm/eval_generation.py \
--checkpoint_path hdlm-group/hdlm-base-epsilon-0.01 \
--sampling_method full_diff \
--algorithm acs \
--save_samples
# Perplexity evaluation
python hdlm/eval_modeling.py \
--checkpoint_path hdlm-group/hdlm-base-epsilon-0.01 \
--work_dir "./logs/eval_modeling_epsilon" \
--dataset ptb
Training Details
- Dataset: OpenWebText
- Batch Size: 512
- Learning Rate: 3e-4 with cosine scheduling
- Epsilon (ε): 0.01 (controls hybrid noising blend)
- Lambda (λ): 1.0 (weighting factor for unmasked tokens)
- Loss Type: Hybrid loss combining masking and random token corruption
- Training Steps: 1M iterations
- Warmup: 50K steps
Sampling Algorithms
The model supports several sampling algorithms:
original
: Standard diffusion samplingacs
: Adaptive Correction Sampler with error correctionremask
: Remasking strategy for improved qualityremdm
: ReMDM-style sampling with probability mixing
Model Variants
Available epsilon values and their characteristics:
- ε = 0.01: Minimal uniform noise, closest to pure absorbing process
- ε = 0.1: Moderate hybrid behavior
- ε = 0.5: Balanced absorbing-uniform blend
Citation
@article{fathi2025unifying,
title={Unifying autoregressive and diffusion-based sequence generation},
author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}},
journal={arXiv preprint arXiv:2504.06416},
year={2025}
}
License
This model is released under the same license as the original HDLM codebase. Please refer to the GitHub repository for license details.
- Downloads last month
- 13