open-llm-leaderboard (Open LLM Leaderboard)

AdinaY

posted an update 3 days ago

Post

260

Inverse IFEval 🔥New benchmark from Bytedance & MAP

m-a-p/Inverse_IFEval
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? (2509.04292)

Testing LLMs on their ability to override biases & follow adversarial instructions.
✨ 8 challenge types
✨ 1,012 CN/EN Qs across 23 domains
✨ Human-in-the-loop + LLM-as-a-Judge

AdinaY

posted an update 3 days ago

Post

202

Klear-46B-A2.5🔥 a sparse MoE LLM developed by the Kwai-Klear Team at Kuaishou

Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1

✨ 46B total / 2.5B active - Apache2.0
✨ Dense-level performance at lower cost
✨ Trained on 22T tokens with progressive curriculum
✨ 64K context length

1 reply

·

AdinaY

posted an update 7 days ago

Post

294

🔥 August highlights from Chinese AI community

zh-ai-community/august-2025-china-open-source-highlights-68a2de5630f406edaf320e88

✨ Efficiency leads the month
- At scale: optimizing compute use in massive MoE models e.g. DeepSeek v3.1
- In small models: lightweight & deployable
e.g. MiniCPM V 4.5, Step Audio 2-mini, Intern S1-mini,Ovis2.5-9B etc.

✨ Reasoning + Agentic wave 🌊 Not just demos, but real product use cases.
- Meituan, DeepSeek: large-scale models tuned for reasoning & tools
- Qwen, GLM, InternLM: multimodal reasoning + agentic interaction
- CodeAgent, Prover, Baichuan-M2-32B: domain-focused (coding, logic, specialized reasoning)

✨ Open source is exploding across all types of companies!!
- Big tech: Tencent, ByteDance, Xiaomi, Kuaishou, Alibaba/Qwen, Skywork, Ant Group
- Startups: DeepSeek (yes, still a startup!), Zhipu, Baichuan, StepFun, OpenBMB
- New entrants: Meituan, RedNote
- Research labs: Shanghai AI Lab (InternLM, OpenGVLab)

✨ Open source was explicitly mentioned in the State Council’s new guidance on deepening the "AI+" strategy.
- Open-source: support communities, encourage contributions (incl. university credits & recognition), foster new application approaches, and build globally impactful ecosystems 👀

💡 The Chinese community didn’t slow down at all in August 🤯 September, the last month before the Golden Week holiday, may bring even more surprises.

Stay Tuned!

AdinaY

posted an update 7 days ago

Post

269

Hunyuan-MT-7B 🔥 open translation model released by Tencent Hunyuan

tencent/hunyuan-mt-68b42f76d473f82798882597

✨ Supports 33 languages, including 5 ethnic minority languages in China 👀
✨ Including a translation ensemble model: Chimera-7B
✨ Full pipeline: pretrain > CPT > SFT > enhancement > ensemble refinement > SOTA performance at similar scale

AdinaY

posted an update 7 days ago

Post

240

From food delivery to frontier AI 🚀 Meituan, the leading lifestyle platform just dropped its first open SoTA LLM: LongCat-Flash 🔥

meituan-longcat/LongCat-Flash-Chat

✨ 560B total / ~27B active MoE — MIT license
✨ 128k context length + advanced reasoning
✨ ScMoE design: 100+ TPS inference
✨ Stable large-scale training + strong agentic performance

AdinaY

posted an update 10 days ago

Post

521

USO 🎨 Unified customization model released by Bytedance research

Demo
bytedance-research/USO
Model
bytedance-research/USO
Paper
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning (2508.18966)

✨ Large-scale triplet dataset (content, style, stylized)
✨ Disentangled learning: style alignment + content preservation
✨ Style Reward Learning (SRL) for higher fidelity
✨ USO-Bench: 1st benchmark for style & subject jointly
✨ SOTA results on subject consistency & style similarity

AdinaY

posted an update 10 days ago

Post

409

Step-Audio 2🔥 New end to end multimodal LLM for audio & speech, released by StepFun

stepfun-ai/step-audio-2-68b003c3a47b273fffaf67a8

✨ Direct raw audio: text & speech ,no ASR+LLM+TTS pipeline
✨ High-IQ reasoning: RL + CoT for paralinguistic cues
✨ Multimodal RAG + tool calling
✨ Emotion, timbre, dialect & style control
✨ SOTA on ASR, paralinguistic, speech dialog

AdinaY

posted an update 13 days ago

Post

1091

🇨🇳 China’s State Council just released its “AI+” Action Plan (2025)

<The State Council’s Guidance on Deepened Implementation of the ‘AI+’ Strategy>
zh-ai-community/china-ai-policy-research

✨Goal: By 2035, AI will deeply empower all sectors, reshape productivity & society

✨Focus on 6 pillars:
>Science & Tech
>Industry
>Consumption
>Public welfare
>Governance
>Global cooperation

✨Highlights:
>Models: advance theory, efficient training/inference, evaluation system
>Data: high-quality datasets, IP/copyright reform, new incentives
>Compute: boost chips & clusters, improve national network, promote cloud standardization, and ensure inclusive, efficient, green, secure supply.
>Applications: AI-as-a-service, test bases, new standards
>Open-source: support communities, encourage contributions (incl. university credits & recognition), foster new application approaches, and build globally impactful ecosystems 👀
>Talent, policy & safety frameworks to secure sustainable growth

AdinaY

posted an update 13 days ago

Post

4880

MiniCPM-V 4.5 🚀 New MLLM for image, multi-image & video understanding, running even on your phone, released by OpenBMB

openbmb/MiniCPM-V-4_5

✨ SOTA vision language capability
✨ 96× video token compression > high-FPS & long video reasoning
✨ Switchable fast vs deep thinking modes
✨ Strong OCR, document parsing, supports 30+ languages

AdinaY

posted an update 13 days ago

Post

296

InternVL3.5 🔥 New family of multimodal model by Shanghai AI lab

OpenGVLab/internvl35-68ac87bd52ebe953485927fb

✨ 1B · 2B · 4B · 8B · 14B · 38B ｜ MoE → 20B-A4B · 30B-A3B · 241B-A28B 📄Apache 2.0
✨ +16% reasoning performance, 4.05× speedup vs InternVL3
✨ Cascade RL (offline + online) : stronger reasoning
✨ ViR: efficient visual token routing
✨ DvD: calable vision–language deployment
✨ Supports GUI & embodied agency 🤖

AdinaY

posted an update 18 days ago

Post

607

Excited to see another tech company OPPO now sharing papers, models, and datasets on the hub 🔥🚀

PersonalAILab
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL (2508.13167)

Their work Chain-of-Agents (CoA) equips a single LLM with multi agent collaboration, using distillation and RL to solve complex tasks end-to-end.

AdinaY

posted an update 18 days ago

Post

3631

Seed-OSS 🔥 The latest open LLM from Bytedance Seed team

ByteDance-Seed/seed-oss-68a609f4201e788db05b5dcd

✨ 36B - Base & Instruct
✨ Apache 2.0
✨ Native 512K long context
✨ Strong reasoning & agentic intelligence
✨ 2 Base versions: with & without synthetic data

AdinaY

posted an update 19 days ago

Post

5417

✨ DeepSeek V3.1 just dropped on the hub.
deepseek-ai/DeepSeek-V3.1-Base

AdinaY

posted an update 20 days ago

Post

487

Before my vacation: Qwen releasing.
When I came back: Qwen still releasing
Respect!!🫡

Meet Qwen Image Edit 🔥 the image editing version of Qwen-Image by
@Alibaba_Qwen

Qwen/Qwen-Image-Edit

✨ Apache 2.0
✨ Semantic + Appearance Editing: rotate, restyle, add/remove 🎨
✨ Precise Text Editing → edit CN/EN text, keep style

alozowski

in open-llm-leaderboard/results 26 days ago

Create README.md

1

#33 opened 26 days ago by

MonsterDo000

alozowski

in open-llm-leaderboard/open_llm_leaderboard 26 days ago

Create app.py

#1148 opened 26 days ago by

iamtheabdullah

albertvillanova

posted an update 27 days ago

Post

3299

Latest smolagents release supports GPT-5: build agents that think, plan, and act.
⚡ Upgrade now and put GPT-5 to work!

meg

posted an update 27 days ago

Post

2798

New work from my socially-minded colleagues at Hugging Face, creating some foundations for AI companionship behavior evaluation.
Evaluation Dataset: AI-companionship/INTIMA
Paper: AI-companionship/INTIMA
Work from @giadap , @frimelle , @yjernite .

2 replies

·

albertvillanova

posted an update 28 days ago

Post

462

🚀 smolagents v1.21.0 is here!
Now with improved safety in the local Python executor: dunder calls are blocked!
⚠️ Still, not fully isolated: for untrusted code, use a remote executor instead: Docker, E2B, Wasm.
✨ Many bug fixes: more reliable code.
👉 https://github.com/huggingface/smolagents/releases/tag/v1.21.0