ModernBERT-large-llm-router

This model is a fine-tuned version of the answerdotai/ModernBERT-large model using the DevQuasar/llm_router_dataset-synth dataset.

The fine-tuned model achieves the following results on the test set:

  • Loss: 0.0555
  • F1: 0.9933

This model was trained using a RTX 4090

Model description

See original answerdotai/ModernBERT-base model card for additional information. This model is intended to classify queries for LLM routing. where advanced/complicated queries are labeled as 1 (large_llm) and simpler queries are labeled as 0 (small_llm).

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 16
  • gradient_accumulation_steps: 2
  • bf16: True
  • seed: 42
  • optimizer: Use adamw_torch_fused
  • lr_scheduler_type: linear
  • num_epochs: 5

Training Code

GITHUB URL TO BE ADDED

Training results

Epoch Validation Loss F1
1.0 0.0296 0.9907
2.0 0.0327 0.9911
3.0 0.0474 0.9933
4.0 0.0563 0.9933
5.0 0.0554 0.9933
Downloads last month
24
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for anthonyivn/ModernBERT-Base-llm-router

Finetuned
(428)
this model

Dataset used to train anthonyivn/ModernBERT-Base-llm-router