openai/gpt-oss-120b · gp-oss-120b — Exceptional Reasoning, Not Yet AGI Scale

I'm excited to share that gp-oss-120b is now leading my French LLM reasoning leaderboard (https://huggingface.co/spaces/Deepmama/LLM-FR_Leaderboard).

This model delivers outstanding performance on benchmarks designed to evaluate reasoning, logical inference, and critical thinking in French. On these tasks, it clearly outperforms larger or more well-known models (Qwen3, Deepseek-R1, ...).

That said, it performs less well on semantic puzzle datasets, which are more sensitive to model size and memorization than deep reasoning.

👉 Conclusion: superb reasoning abilities, but we're still below AGI-scale capabilities — especially for tasks that demand broad semantic compression or massive world knowledge... (and not good for Instruction responding ! IFEval translated in french is disappointing)