combines reinforcement learning (RL) and large language models (LLMs) to improve exploration using diverse tool generation during inference
Gabriel Bo
gabrielbo
·
AI & ML interests
NLP, Scaling, Test-time Compute
Organizations
datasets
9
gabrielbo/swirl-trajectories-mmlu-pro
Viewer
•
Updated
•
24.8k
•
12
•
1
gabrielbo/explore-rl-hotpota-trajectories
Updated
gabrielbo/gpqa-llama-3-8b-verifier
Viewer
•
Updated
•
910
•
5
gabrielbo/mmlu-college-llama-3-8b-verifiers
Viewer
•
Updated
•
870
•
6
gabrielbo/mmlu-pro-specific-choice-scored
Viewer
•
Updated
•
870
•
6
gabrielbo/mmlu-pro-baseline-scored
Viewer
•
Updated
•
87
•
3
gabrielbo/mmlu-pro-verifiers-specific-choice
Viewer
•
Updated
•
870
•
5
gabrielbo/mmlu-pro-verifiers-baseline
Viewer
•
Updated
•
87
•
6
gabrielbo/mmlu-pro-justifications-llama-3
Viewer
•
Updated
•
87
•
3