Hybrid Readability Assessment Model
A hybrid machine learning model for assessing text readability and grade level, combining Ridge regression and Random Forest algorithms for optimal accuracy across different grade ranges.
Model Description
This hybrid model uses a two-stage prediction approach:
- Primary Decision Maker: Ridge regression (alpha=10.0) makes the initial grade prediction
- Refinement: If Ridge predicts grade ≤ 5, Random Forest provides the final prediction
- High Grades: If Ridge predicts grade > 5, the Ridge prediction is used directly
This approach leverages the strengths of both models:
- Ridge regression: Better for higher grade levels and provides stable linear predictions
- Random Forest: More accurate for lower grade levels with complex feature interactions
Model Performance
- Test MAE: 0.513
- Test R²: 0.775
- Training Samples: 2,500
- Feature Count: 16
- Created: 2025-07-26T23:16:02.628443
Model Size
- File Size: 6.0 MB
Features
The model uses 16 features including:
- Traditional Readability Metrics: Flesch-Kincaid, Coleman-Liau, ARI, SMOG, Gunning Fog, Dale-Chall
- Age of Acquisition (AoA) Features: Mean, median, percentiles, difficult word ratios
- Source Indicators: Dataset source information
Usage
import joblib
from huggingface_hub import hf_hub_download
# Download the model
model_path = hf_hub_download(
repo_id="yimingwang123/hybrid-grade-assessment-model",
filename="hybrid_readability_model.pkl"
)
# Load the model
model_data = joblib.load(model_path)
# Extract components
ridge_model = model_data['ridge_model']
rf_model = model_data['rf_model']
scaler = model_data['scaler']
feature_columns = model_data['feature_columns']
# Make predictions (you'll need to implement the hybrid logic)
# See the training script for full implementation
Training Data
The model was trained on a combination of:
- WeeBit Corpus: Web-based texts with human-annotated grade levels
- CLEAR Corpus: Simplified texts for language learners
Hybrid Logic
def predict_hybrid(ridge_pred, rf_pred):
if ridge_pred <= 5.0:
return rf_pred # Use Random Forest for lower grades
else:
return ridge_pred # Use Ridge for higher grades
Citation
If you use this model in your research, please cite:
@misc{hybrid-readability-model,
title={Hybrid Readability Assessment Model},
author={Grade-Aware LLM Project},
year={2025},
url={https://huggingface.co/yimingwang123/hybrid-grade-assessment-model}
}
License
This model is released under the MIT License.
Contact
For questions about this model, please open an issue in the repository.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support