✨ Efficiency leads the month - At scale: optimizing compute use in massive MoE models e.g. DeepSeek v3.1 - In small models: lightweight & deployable e.g. MiniCPM V 4.5, Step Audio 2-mini, Intern S1-mini,Ovis2.5-9B etc.
✨ Reasoning + Agentic wave 🌊 Not just demos, but real product use cases. - Meituan, DeepSeek: large-scale models tuned for reasoning & tools - Qwen, GLM, InternLM: multimodal reasoning + agentic interaction - CodeAgent, Prover, Baichuan-M2-32B: domain-focused (coding, logic, specialized reasoning)
✨ Open source is exploding across all types of companies!! - Big tech: Tencent, ByteDance, Xiaomi, Kuaishou, Alibaba/Qwen, Skywork, Ant Group - Startups: DeepSeek (yes, still a startup!), Zhipu, Baichuan, StepFun, OpenBMB - New entrants: Meituan, RedNote - Research labs: Shanghai AI Lab (InternLM, OpenGVLab)
✨ Open source was explicitly mentioned in the State Council’s new guidance on deepening the "AI+" strategy. - Open-source: support communities, encourage contributions (incl. university credits & recognition), foster new application approaches, and build globally impactful ecosystems 👀
💡 The Chinese community didn’t slow down at all in August 🤯 September, the last month before the Golden Week holiday, may bring even more surprises.
✨ Supports 33 languages, including 5 ethnic minority languages in China 👀 ✨ Including a translation ensemble model: Chimera-7B ✨ Full pipeline: pretrain > CPT > SFT > enhancement > ensemble refinement > SOTA performance at similar scale
MiniCPM-V 4.5 🚀 New MLLM for image, multi-image & video understanding, running even on your phone, released by OpenBMB openbmb/MiniCPM-V-4_5
✨ SOTA vision language capability ✨ 96× video token compression > high-FPS & long video reasoning ✨ Switchable fast vs deep thinking modes ✨ Strong OCR, document parsing, supports 30+ languages
✨ 36B - Base & Instruct ✨ Apache 2.0 ✨ Native 512K long context ✨ Strong reasoning & agentic intelligence ✨ 2 Base versions: with & without synthetic data
✨ The multimodal wave🌊 - GLM-4.1V-Thinking: Image+Text > Text - Intern-S1: Image+Text > Text - Wan 2.2 - Text +Image > video - Skywork-R1V3: Image+Text > Text - Skywork-UniPic: Text > Image / Image > Text - Tar-7B: Any-to-Any - Ming-Lite-Omni-1.5: Any-to-Any - Step3: Image+Text > Text - HunyuanWorld-1: Image > 3D - ThinkSound: Video > Audio - Neta-Lumina: Text > Image
✨ Big month not only for models, but for policy too🏛️ - Announced Global Action Plan for AI Governance - Proposes to set up a World AI Cooperation Organization in Shanghai - Released International AI Open Source Collaboration Initiative - Published Risk Assessment Guidelines for Endpoint AI Agents
✨ Big event - WAIC - 355K offline visitors - 108 new released in 4 days - 145 sessions across key domains
I’ve been tracking things closely, but July’s open-source wave still blew me away. Can’t wait to see what’s coming next! 🚀
✨ 321B total / 32B active - Apache 2.0 ✨ MFA + AFD : cutting decoding cost by up to 70% vs. DeepSeek-V3 ✨ 4T image-text pretraining: strong vision–language grounding ✨ Modular, efficient, deployable: runs on just 8×48GB GPUs