Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -15,7 +15,7 @@ language: | |
| 15 |  | 
| 16 | 
             
            # Weaver Distilled for MATH500
         | 
| 17 |  | 
| 18 | 
            -
             | 
| 19 |  | 
| 20 | 
             
            ## Model Details
         | 
| 21 |  | 
| @@ -25,15 +25,6 @@ A distilled cross-encoder model that captures 98.7% of Weaver's accuracy while r | |
| 25 | 
             
            - **Training Data**: MATH500 problems with Weaver scores from 35 LM judges and reward models
         | 
| 26 | 
             
            - **Task**: Binary classification for answer correctness prediction
         | 
| 27 |  | 
| 28 | 
            -
            ## Performance
         | 
| 29 | 
            -
             | 
| 30 | 
            -
            On MATH500 with Llama 3.1 70B generations:
         | 
| 31 | 
            -
            - **Weaver (Full)**: 93.4% accuracy, high compute cost
         | 
| 32 | 
            -
            - **Weaver (Distilled)**: 92.2% accuracy, 99.97% compute reduction
         | 
| 33 | 
            -
            - **Majority Voting**: 83.0% accuracy
         | 
| 34 | 
            -
             | 
| 35 | 
            -
            TODO: replace these with the actual numbers
         | 
| 36 | 
            -
             | 
| 37 | 
             
            ## Quick Start
         | 
| 38 |  | 
| 39 | 
             
            ```python
         | 
|  | |
| 15 |  | 
| 16 | 
             
            # Weaver Distilled for MATH500
         | 
| 17 |  | 
| 18 | 
            +
            This is a distilled cross-encoder model based on ModernBERT-large, trained to predict the correctness of answers on MATH500. This specialized verifier was trained on Weaver scores aggregated over 35 different verifiers and reward models.
         | 
| 19 |  | 
| 20 | 
             
            ## Model Details
         | 
| 21 |  | 
|  | |
| 25 | 
             
            - **Training Data**: MATH500 problems with Weaver scores from 35 LM judges and reward models
         | 
| 26 | 
             
            - **Task**: Binary classification for answer correctness prediction
         | 
| 27 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 28 | 
             
            ## Quick Start
         | 
| 29 |  | 
| 30 | 
             
            ```python
         | 
