InternLM3-8B-instruct🔥 Trained on just 4T tokens, it outperforms Llama3.1-8B and Qwen2.5-7B in reasoning tasks, at 75% lower cost! internlm/internlm3-67875827c377690c01a9131d
✨ MiniMax-text-01: - 456B with 45.9B activated per token - Combines Lightning Attention, Softmax Attention, and MoE for optimal performance - Training context up to 1M tokens, inference handles 4M tokens
✨ MiniMax-VL-01: - ViT-MLP-LLM framework ( non-transformer👀) - Handles image inputs from 336×336 to 2016×2016 - 694M image-caption pairs + 512B tokens processed across 4 stages
MiniCPM-o2.6 🔥 an end-side multimodal LLMs released by OpenBMB from the Chinese community Model: openbmb/MiniCPM-o-2_6 ✨ Real-time English/Chinese conversation, emotion control and ASR/STT ✨ Real-time video/audio understanding ✨ Processes up to 1.8M pixels, leads OCRBench & supports 30+ languages
QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team Qwen/qvq-676448c820912236342b9888 ✨ Combines visual understanding & language reasoning. ✨ Scores 70.3 on MMMU ✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
Megrez-3B-Omni 🔥 an on-device multimodal LLM by Infinigence AI, another startup emerging from the Tsinghua University ecosystem. Model: Infinigence/Megrez-3B-Omni Demo: Infinigence/Megrez-3B-Omni ✨Supports analysis of image, text, and audio modalities ✨Leads in bilingual speech ( English & Chinese ) input, multi-turn conversations, and voice-based queries ✨Outperforms in scene understanding and OCR across major benchmarks
Audio model: ✨Fish Speech 1.5, Text-to-speech in 13 languages, trained on 1M+ hours of audio by FishAudio fishaudio/fish-speech-1.5 ✨ClearVoice, An advanced voice processing framework by Alibaba Tongyi SpeechAI https://huggingface.co/alibabasglab
HunyuanVideo 📹 The new open video generation model by Tencent! 👉 tencent/HunyuanVideo zh-ai-community/video-models-666afd86cfa4e4dd1473b64c ✨ 13B parameters: Probably the largest open video model to date ✨ Unified architecture for image & video generation ✨ Powered by advanced features: MLLM Text Encoder, 3D VAE, and Prompt Rewrite ✨ Delivers stunning visuals, diverse motion, and unparalleled stability 🔓 Fully open with code & weights
Zhipu AI, the Chinese generative AI startup behind CogVideo, just launched their first productized AI Agent - AutoGLM 🔥 👉 https://agent.aminer.cn
With simple text or voice commands, it: ✨ Simulates phone operations effortlessly ✨ Autonomously handles 50+ step tasks ✨ Seamlessly operates across apps
Powered by Zhipu's "Decoupled Interface" and "Self-Evolving Learning Framework" to achieve major performance gains in Phone Use and Web Browser Use!
Meanwhile, GLM4-Edge is now on Hugging Face hub🚀 👉 THUDM/glm-edge-6743283c5809de4a7b9e0b8b Packed with advanced dialogue + multimodal models: 📱 1.5B / 2B models: Built for mobile & in-car systems 💻 4B / 5B models: Optimized for PCs
China launched an algorithm governance campaign to ensure algorithms are more positive, transparent, controllable, fair, and accountable🇨🇳📑 zh-ai-community/china-ai-policy-research
Highlights: ✨ Combat "echo chambers" and addictive content: ban forced tags, data misuse, and excessive collection. ✨ Make rankings transparent: explain algorithms, keep logs, and detect fake accounts. ✨ Protect workers: disclose delivery algorithms and provide appeal channels. ✨ Ban unfair pricing: ensure promo transparency and honest explanations for failures. ✨ Support users: improve recommendations for minors and seniors, promote good content, and detect fakes. ✨ Ensure safety: audit algorithms, secure data, fix flaws, and regularly evaluate models.
⏰ Timeline: Company Self-Checks: before Dec 31, 2024 Verification: before Jan 31, 2025 Effectiveness Review: before Feb 14, 2025
Open reporting channels for algorithm issues during the campaign, monitor complaints, enforce corrections, and provide feedback to users.
✨Fine-tuned with CoT data (open-source + synthetic). ✨Expands solution space with MCTS, guided by model confidence. ✨Novel reasoning strategies & self-reflection enhance complex problem-solving. ✨Pioneers LRM in multilingual machine translation.
✨ Unified 3D generation & text understanding. ✨ 3D meshes as plain text for seamless LLM integration. ✨ High-quality 3D outputs rivaling specialized models.