Shon Fernandez's picture

Shon Fernandez

flexicious

AI & ML interests

None yet

Recent Activity

Organizations

None yet

flexicious's activity

commented on Let's talk about LLM evaluation 5 days ago
view reply

From developing LLM applications over the past couple years, I've realized that regardless of what the hype is all about - nothing beats testing LLMS on your own specific use cases using your own evaluation metrics. For example, I did a comparison of O3-mini vs R1 vs Gemini Flash thinking https://www.youtube.com/watch?v=iBS_FsLcSN0 and realized for certain use cases, they are no better than regular non reasoning models. I am very curious to learn what people are using reasoning models for and how they are evaluating them!