HDLM-Epsilon: Hybrid Diffusion Language Model

Paper Code

This model card is for the hdlm-base model with epsilon=0.01

Model Description

HDLM-Epsilon is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through epsilon-hybrid noising. This model interpolates evolution operators between absorbing and uniform processes, making it conceptually closer to MDLM (Sahoo et al. 2024) while maintaining the benefits of both paradigms.

The epsilon parameter (ε) controls the blend between absorbing and uniform processes during training, where smaller values emphasize the absorbing process and larger values incorporate more uniform noise.

Model Architecture

  • Base Model: Transformer architecture with custom conditioning layers
  • Vocabulary Size: 50,258 tokens (GPT-2 vocabulary + absorbing token)
  • Context Length: 1024 tokens
  • Training: Hybrid loss combining token masking with random token corruption
  • Inference: Supports multiple sampling algorithms including ACS (Adaptive Correction Sampler)

Usage

Quick Start

from hdlm.hf_utils import smart_model_loader
from hdlm.epsilon_hybrid.sample import full_diff
from transformers import GPT2TokenizerFast
import torch

# Load model using smart loader (automatically detects model type)
model, cfg, device, accelerator, metaschedule = smart_model_loader(
    model_path="hdlm-group/hdlm-base-epsilon-0.01",
    model_type="auto",  # automatically detects epsilon_hybrid
    device="cuda"
)

# Load tokenizer
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')

# Generate text
prompt = "The future of artificial intelligence"
prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)

# Full diffusion sampling
generated = full_diff(
    model=model,
    prompt=prompt_ids,
    batch_size=1,
    alg='acs',  # or 'original', 'remask', 'remdm'
    steps=512,
    temperature=1.0,
    context_length=1024,
    device=device
)

# Decode generated text
generated_text = tokenizer.decode(generated[0], skip_special_tokens=True)
print(generated_text)

Evaluation

# Text generation evaluation
python hdlm/eval_generation.py \
    --checkpoint_path hdlm-group/hdlm-base-epsilon-0.01 \
    --sampling_method full_diff \
    --algorithm acs \
    --save_samples

# Perplexity evaluation
python hdlm/eval_modeling.py \
    --checkpoint_path hdlm-group/hdlm-base-epsilon-0.01 \
    --work_dir "./logs/eval_modeling_epsilon" \
    --dataset ptb

Training Details

  • Dataset: OpenWebText
  • Batch Size: 512
  • Learning Rate: 3e-4 with cosine scheduling
  • Epsilon (ε): 0.01 (controls hybrid noising blend)
  • Lambda (λ): 1.0 (weighting factor for unmasked tokens)
  • Loss Type: Hybrid loss combining masking and random token corruption
  • Training Steps: 1M iterations
  • Warmup: 50K steps

Sampling Algorithms

The model supports several sampling algorithms:

  • original: Standard diffusion sampling
  • acs: Adaptive Correction Sampler with error correction
  • remask: Remasking strategy for improved quality
  • remdm: ReMDM-style sampling with probability mixing

Model Variants

Available epsilon values and their characteristics:

  • ε = 0.01: Minimal uniform noise, closest to pure absorbing process
  • ε = 0.1: Moderate hybrid behavior
  • ε = 0.5: Balanced absorbing-uniform blend

Citation

@article{fathi2025unifying,
  title={Unifying autoregressive and diffusion-based sequence generation},
  author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}},
  journal={arXiv preprint arXiv:2504.06416},
  year={2025}
}

License

This model is released under the same license as the original HDLM codebase. Please refer to the GitHub repository for license details.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including hdlm-group/hdlm-base-epsilon-0.01