HDLM-Epsilon: Hybrid Diffusion Language Model

This model card is for the hdlm-base model with epsilon=0.01

Model Description

HDLM-Epsilon is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through epsilon-hybrid noising. This model interpolates evolution operators between absorbing and uniform processes, making it conceptually closer to MDLM (Sahoo et al. 2024) while maintaining the benefits of both paradigms.

The epsilon parameter (ε) controls the blend between absorbing and uniform processes during training, where smaller values emphasize the absorbing process and larger values incorporate more uniform noise.

Model Architecture

Base Model: Transformer architecture with custom conditioning layers
Vocabulary Size: 50,258 tokens (GPT-2 vocabulary + absorbing token)
Context Length: 1024 tokens
Training: Hybrid loss combining token masking with random token corruption
Inference: Supports multiple sampling algorithms including ACS (Adaptive Correction Sampler)

Usage

Quick Start

from hdlm.hf_utils import smart_model_loader
from hdlm.epsilon_hybrid.sample import full_diff
from transformers import GPT2TokenizerFast
import torch

# Load model using smart loader (automatically detects model type)
model, cfg, device, accelerator, metaschedule = smart_model_loader(
    model_path="hdlm-group/hdlm-base-epsilon-0.01",
    model_type="auto",  # automatically detects epsilon_hybrid
    device="cuda"
)

# Load tokenizer
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')

# Generate text
prompt = "The future of artificial intelligence"
prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)

# Full diffusion sampling
generated = full_diff(
    model=model,
    prompt=prompt_ids,
    batch_size=1,
    alg='acs',  # or 'original', 'remask', 'remdm'
    steps=512,
    temperature=1.0,
    context_length=1024,
    device=device
)

# Decode generated text
generated_text = tokenizer.decode(generated[0], skip_special_tokens=True)
print(generated_text)

Evaluation

# Text generation evaluation
python hdlm/eval_generation.py \
    --checkpoint_path hdlm-group/hdlm-base-epsilon-0.01 \
    --sampling_method full_diff \
    --algorithm acs \
    --save_samples

# Perplexity evaluation
python hdlm/eval_modeling.py \
    --checkpoint_path hdlm-group/hdlm-base-epsilon-0.01 \
    --work_dir "./logs/eval_modeling_epsilon" \
    --dataset ptb

Training Details

Dataset: OpenWebText
Batch Size: 512
Learning Rate: 3e-4 with cosine scheduling
Epsilon (ε): 0.01 (controls hybrid noising blend)
Lambda (λ): 1.0 (weighting factor for unmasked tokens)
Loss Type: Hybrid loss combining masking and random token corruption
Training Steps: 1M iterations
Warmup: 50K steps

Sampling Algorithms

The model supports several sampling algorithms:

original: Standard diffusion sampling
acs: Adaptive Correction Sampler with error correction
remask: Remasking strategy for improved quality
remdm: ReMDM-style sampling with probability mixing

Model Variants

Available epsilon values and their characteristics:

ε = 0.01: Minimal uniform noise, closest to pure absorbing process
ε = 0.1: Moderate hybrid behavior
ε = 0.5: Balanced absorbing-uniform blend

Citation

@article{fathi2025unifying,
  title={Unifying autoregressive and diffusion-based sequence generation},
  author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}},
  journal={arXiv preprint arXiv:2504.06416},
  year={2025}
}

License

This model is released under the same license as the original HDLM codebase. Please refer to the GitHub repository for license details.

Downloads last month: 13

Collection including hdlm-group/hdlm-base-epsilon-0.01

HDLM

Collection

Hybrid Diffusion Language Models • 6 items • Updated 7 days ago • 2