Charles
Charles is a 3B, multimodal reasoning model built for agentic coding, mathematics, and health.
This model was trained using PPO techniques based off of examples from Open-R1, as well as from-scratch dataset generation from the Charles github project and Mindcraft to improve agentic tool usage.
The base model was Qwen2.5 3B VL, and was trained on 51526 examples and 2 epochs of pure reasoning data, most of which were coding examples.
This model is based off of techniques and dataset formatting learned from the Andy-4 series of models as well as Smol-reason2.1
Charles is an acronym and stands for:
"Conversational Helpful Assistant with Robust Logic and Extensible Skills"
I will be posting the Charles framework web app after I release the Charles LLM and prove that it works with the application well, and can outperform some larger models that aren't trained for reasoning nor agentic code use.