File size: 2,627 Bytes
0f4f35c 4972ff1 8f352d6 0f4f35c 8f352d6 0f4f35c 7155c7a 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0e40d5d 8f352d6 0e40d5d 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c 0e40d5d 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c 8f352d6 0f4f35c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
license: mit
pipeline_tag: text-classification
library_name: transformers
base_model: answerdotai/ModernBERT-large
tags:
- math
- reasoning
- verification
- weaver
- cross-encoder
language:
- en
---
# Weaver Distilled for MATH500
This is a distilled cross-encoder model based on ModernBERT-large, trained to predict the correctness of answers on MATH500. This specialized verifier was trained on Weaver scores aggregated over 35 different verifiers and reward models.
## Model Details
- **Base Model**: [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) (395M parameters)
- **Architecture**: Cross-encoder with MLP head (1024 → 512 → 256 → 1)
- **Max Sequence Length**: 4096 tokens
- **Training Data**: MATH500 problems with Weaver scores from 35 LM judges and reward models
- **Task**: Binary classification for answer correctness prediction
## Quick Start
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "hazyresearch/Weaver_Distilled_for_MATH500"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example usage
instruction = "Solve: What is the derivative of x^2 + 3x + 2?"
response = "The derivative is 2x + 3. Using the power rule..."
# Tokenize input pair
inputs = tokenizer(
instruction,
response,
truncation=True,
max_length=4096,
padding=True,
return_tensors="pt"
)
# Get correctness score
with torch.no_grad():
outputs = model(**inputs)
score = torch.sigmoid(outputs.logits).item()
print(f"Correctness score: {score:.3f}")
print(f"Prediction: {'Correct' if score > 0.5 else 'Incorrect'}")
```
## Training Details
This model was trained using the [Weaver distillation pipeline](https://github.com/ScalingIntelligence/scaling-verification/tree/main/distillation). For training your own distilled models, see the [distillation README](https://github.com/ScalingIntelligence/scaling-verification/blob/main/distillation/README.md).
## Evaluation
Evaluate this model using:
```bash
python evaluate_crossencoder.py \
--model_name "answerdotai/ModernBERT-large" \
--checkpoint_path "hazyresearch/Weaver_Distilled_for_MATH500" \
--dataset_path "hazyresearch/MATH500_with_Llama_3.1_70B_Instruct_v1" \
--dataset_split "data" \
--max_length 4096 \
--batch_size 64
```
## Citation
```bibtex
@article{weaver2025,
title={Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers},
author={},
journal={arXiv preprint},
year={2025}
}
``` |