Text Classification
Transformers
Safetensors
English
modernbert

Reasoning Rating Model

This repository contains the model described in the paper Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models.

Code: https://github.com/opendatalab/Meta-rater

Model Description

This model is a fine-tuned version of ModernBERT-base designed to evaluate the Reasoning dimension of text quality on a 5-point scale (0-5). Reasoning measures the complexity of logical thinking and analysis required from readers, focusing on multi-step reasoning, argument structure, and analytical depth.

Model Details

  • Base Model: ModernBERT-base
  • Parameters: 149M
  • Context Window: 4,096 tokens
  • Task: Text quality rating (regression)
  • Score Range: 0-5 (continuous)
  • Performance: 89.59% F1 score, 96.32% accuracy

Rating Scale

The model uses an additive 5-point rating system:

  • 0: Contain no reasoning content
  • 1: Preliminary reasoning elements with single causal relationships or simple logical judgments, lacking depth
  • 2: Basic reasoning with some logical relationships requiring moderate thought, simple argumentative structures
  • 3: Good reasoning complexity with multiple steps requiring complex thought, interrelated arguments with some depth
  • 4: High reasoning complexity with multi-layered logic and in-depth analysis, comprehensive evaluation required
  • 5: Exceptional reasoning complexity demanding deep analysis and innovative thinking, multi-dimensional and interdisciplinary

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
model_name = "opendatalab/meta-rater-reasoning-rating"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example text
text = "By analyzing the correlation between economic indicators and social outcomes, we can identify causal mechanisms that explain policy effectiveness..."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=4096)
with torch.no_grad():
    outputs = model(**inputs)
    score = outputs.logits.squeeze().argmax(dim=0)

print(f"Reasoning Score: {score:.2f}")

Training Details

  • Training Data: 747,422 examples from SlimPajama dataset
  • Annotation Model: Llama-3.3-70B-Instruct
  • Training Epochs: 10
  • Evaluation Split: 93,428 test examples
  • Data Split: 8:1:1 (train:dev:test)

Applications

This model is particularly valuable for:

  • Educational content curation for reasoning skill development
  • Research paper evaluation and academic content assessment
  • Data selection for training reasoning-capable language models
  • Curriculum design for critical thinking courses
  • Content filtering for analytical and argumentative writing
  • Quality assessment of logical and analytical texts

What the Model Evaluates

The model assesses various aspects of reasoning complexity:

  • Logical structure and argument coherence
  • Multi-step reasoning processes
  • Causal relationship identification and analysis
  • Evidence integration and synthesis
  • Problem-solving approaches and methodologies
  • Critical analysis depth and sophistication

Key Reasoning Indicators

  • Simple reasoning: Single cause-effect relationships, basic problem-solution pairs
  • Moderate reasoning: Multiple interrelated factors, comparative analysis
  • Complex reasoning: Multi-layered arguments, comprehensive evaluations
  • Advanced reasoning: Interdisciplinary integration, innovative thinking, mathematical models

What the Model Does NOT Consider

  • The specific language the text is written in
  • The length of the text
  • Usage of placeholders for data privacy or safety
  • Writing style, grammar, or formatting quality

Limitations

  • Designed primarily for English text
  • May not capture domain-specific reasoning patterns equally well across all fields
  • Performance may vary for highly specialized mathematical or technical reasoning
  • Should be combined with other quality metrics for comprehensive assessment

Citation

If you use this model in your research, please cite:

@article{zhuang2025meta,
  title={Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models},
  author={Zhuang, Xinlin and Peng, Jiahui and Ma, Ren and Wang, Yinfan and Bai, Tianyi and Wei, Xingjian and Qiu, Jiantao and Zhang, Chi and Qian, Ying and He, Conghui},
  journal={arXiv preprint arXiv:2504.14194},
  year={2025}
}

License

This model is released under the same license as the base ModernBERT model.

Contact

For questions or issues, please contact the authors or open an issue in the repository.

Downloads last month
22
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for opendatalab/meta-rater-reasoning-rating

Finetuned
(575)
this model

Dataset used to train opendatalab/meta-rater-reasoning-rating