--- license: mit pipeline_tag: text-classification library_name: transformers base_model: answerdotai/ModernBERT-large tags: - math - reasoning - verification - weaver - cross-encoder language: - en --- # Weaver Distilled for MATH500 This is a distilled cross-encoder model based on ModernBERT-large, trained to predict the correctness of answers on MATH500. This specialized verifier was trained on Weaver scores aggregated over 35 different verifiers and reward models. ## Model Details - **Base Model**: [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) (395M parameters) - **Architecture**: Cross-encoder with MLP head (1024 → 512 → 256 → 1) - **Max Sequence Length**: 4096 tokens - **Training Data**: MATH500 problems with Weaver scores from 35 LM judges and reward models - **Task**: Binary classification for answer correctness prediction ## Quick Start ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "hazyresearch/Weaver_Distilled_for_MATH500" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example usage instruction = "Solve: What is the derivative of x^2 + 3x + 2?" response = "The derivative is 2x + 3. Using the power rule..." # Tokenize input pair inputs = tokenizer( instruction, response, truncation=True, max_length=4096, padding=True, return_tensors="pt" ) # Get correctness score with torch.no_grad(): outputs = model(**inputs) score = torch.sigmoid(outputs.logits).item() print(f"Correctness score: {score:.3f}") print(f"Prediction: {'Correct' if score > 0.5 else 'Incorrect'}") ``` ## Training Details This model was trained using the [Weaver distillation pipeline](https://github.com/ScalingIntelligence/scaling-verification/tree/main/distillation). For training your own distilled models, see the [distillation README](https://github.com/ScalingIntelligence/scaling-verification/blob/main/distillation/README.md). ## Evaluation Evaluate this model using: ```bash python evaluate_crossencoder.py \ --model_name "answerdotai/ModernBERT-large" \ --checkpoint_path "hazyresearch/Weaver_Distilled_for_MATH500" \ --dataset_path "hazyresearch/MATH500_with_Llama_3.1_70B_Instruct_v1" \ --dataset_split "data" \ --max_length 4096 \ --batch_size 64 ``` ## Citation ```bibtex @article{weaver2025, title={Weaver: Shrinking the Generation-Verification Gap with Weak Verifiers}, author={}, journal={arXiv preprint}, year={2025} } ```