metadata
license: mit
pipeline_tag: text-classification
library_name: transformers
base_model: answerdotai/ModernBERT-large
tags:
- math
- reasoning
- verification
- weaver
- cross-encoder
language:
- en
Weaver Distilled for MATH500
This is a distilled cross-encoder model based on ModernBERT-large, trained to predict the correctness of answers on MATH500. This specialized verifier was trained on Weaver scores aggregated over 35 different verifiers and reward models.
Model Details
- Base Model: answerdotai/ModernBERT-large (395M parameters)
- Architecture: Cross-encoder with MLP head (1024 → 512 → 256 → 1)
- Max Sequence Length: 4096 tokens
- Training Data: MATH500 problems with Weaver scores from 35 LM judges and reward models
- Task: Binary classification for answer correctness prediction
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "hazyresearch/Weaver_Distilled_for_MATH500"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example usage
instruction = "Solve: What is the derivative of x^2 + 3x + 2?"
response = "The derivative is 2x + 3. Using the power rule..."
# Tokenize input pair
inputs = tokenizer(
instruction,
response,
truncation=True,
max_length=4096,
padding=True,
return_tensors="pt"
)
# Get correctness score
with torch.no_grad():
outputs = model(**inputs)
score = torch.sigmoid(outputs.logits).item()
print(f"Correctness score: {score:.3f}")
print(f"Prediction: {'Correct' if score > 0.5 else 'Incorrect'}")
Training Details
This model was trained using the Weaver distillation pipeline. For training your own distilled models, see the distillation README.
Evaluation
Evaluate this model using:
python evaluate_crossencoder.py \
--model_name "answerdotai/ModernBERT-large" \
--checkpoint_path "hazyresearch/Weaver_Distilled_for_MATH500" \
--dataset_path "hazyresearch/MATH500_with_Llama_3.1_70B_Instruct_v1" \
--dataset_split "data" \
--max_length 4096 \
--batch_size 64
Citation
@article{weaver2025,
title={Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers},
author={},
journal={arXiv preprint},
year={2025}
}