hazyresearch
/

Weaver_Distilled_ModernBERT_Large_for_MATH500

Text Classification

Model card Files Files and versions

Weaver_Distilled_ModernBERT_Large_for_MATH500 / README.md

brendanm12345's picture

Update README.md

7155c7a verified 4 months ago

|

2.63 kB

	---
	license: mit
	pipeline_tag: text-classification
	library_name: transformers
	base_model: answerdotai/ModernBERT-large
	tags:
	- math
	- reasoning
	- verification
	- weaver
	- cross-encoder
	language:
	- en
	---

	# Weaver Distilled for MATH500

	This is a distilled cross-encoder model based on ModernBERT-large, trained to predict the correctness of answers on MATH500. This specialized verifier was trained on Weaver scores aggregated over 35 different verifiers and reward models.

	## Model Details

	- Base Model: [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) (395M parameters)
	- Architecture: Cross-encoder with MLP head (1024 → 512 → 256 → 1)
	- Max Sequence Length: 4096 tokens
	- Training Data: MATH500 problems with Weaver scores from 35 LM judges and reward models
	- Task: Binary classification for answer correctness prediction

	## Quick Start

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	model_name = "hazyresearch/Weaver_Distilled_for_MATH500"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Example usage
	instruction = "Solve: What is the derivative of x^2 + 3x + 2?"
	response = "The derivative is 2x + 3. Using the power rule..."

	# Tokenize input pair
	inputs = tokenizer(
	instruction,
	response,
	truncation=True,
	max_length=4096,
	padding=True,
	return_tensors="pt"
	)

	# Get correctness score
	with torch.no_grad():
	outputs = model(**inputs)
	score = torch.sigmoid(outputs.logits).item()

	print(f"Correctness score: {score:.3f}")
	print(f"Prediction: {'Correct' if score > 0.5 else 'Incorrect'}")
	```

	## Training Details

	This model was trained using the [Weaver distillation pipeline](https://github.com/ScalingIntelligence/scaling-verification/tree/main/distillation). For training your own distilled models, see the [distillation README](https://github.com/ScalingIntelligence/scaling-verification/blob/main/distillation/README.md).

	## Evaluation

	Evaluate this model using:

	```bash
	python evaluate_crossencoder.py \
	--model_name "answerdotai/ModernBERT-large" \
	--checkpoint_path "hazyresearch/Weaver_Distilled_for_MATH500" \
	--dataset_path "hazyresearch/MATH500_with_Llama_3.1_70B_Instruct_v1" \
	--dataset_split "data" \
	--max_length 4096 \
	--batch_size 64
	```

	## Citation

	```bibtex
	@article{weaver2025,
	title={Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers},
	author={},
	journal={arXiv preprint},
	year={2025}
	}
	```