Just explored #MedAgentBench from @Yale researchers and it's mind-blowing! They've created a cutting-edge benchmark that finally exposes the true capabilities of LLMs in complex medical reasoning.
β‘ Key discoveries:
DeepSeek R1 & OpenAI O3 dominate clinical reasoning tasks Agent-based frameworks deliver exceptional performance-cost balance Open-source alternatives are closing the gap at fraction of the cost
This work shatters previous benchmarks that failed to challenge today's advanced models. The future of medical AI is here: https://github.com/gersteinlab/medagents-benchmark #MedicalAI #MachineLearning #AIinHealthcare π₯
With the open-weight release of CogVideoX-5B from THUDM, i.e. GLM team, the Video Generation Model (how about calling it VGM) field has officially became the next booming "LLM"
What does the landscape look like? What are other video generation models? This collection below is all your need.
π Easier Integration: Apache 2.0 is compatible with many other licenses, simplifying tasks like model merging with models of different licensing requirements.
π« No Permission Needed: Custom licenses often require explicit permission and additional documentation work of filling forms, creating barriers. Apache 2.0 removes this hurdle, letting devs focus on innovation.
DeepSeekV2 is a big deal. Not only because its significant improvements to both key components of Transformer: the Attention layer and FFN layer.
It has also completed disrupted the Chines LLM market and forcing the competitors to drop the price to 1% of the original price.
---
There are two key components in Transformer architecture: the self-attention layer, which captures relationships between tokens in context, and the Feed-Forward Network (FFN) layer, which stores knowledge.
DeepSeek V2 introduces optimizations to both:
Attention layer normally uses KV Cache to reduce repetitive compute, but it consumes significant GPU RAM, limiting concurrent requests. DeepSeek V2 introduces Multi-head Latent Attention (MLA), which stores only a small latent representation, resulting in substantial RAM savings.
DeepSeek V2 utilizes 162 experts instead of the usual 8 as in Mixtral. This approach segments experts into finer granularity for higher specialization and more accurate knowledge acquisition. Activating only a small subset of experts for each token, leads to efficient processing.
It disrupted the market by dropping API prices to $0.14 per 1M tokens. This dramatic reduction forced competitors like GLM, Ernie, and QWen to follow suit, lowering their prices to 1% of their original offerings. Now, users can access these APIs at 1/35th the cost of ChatGPT-4o.
Welcome Bunny! A family of lightweight but powerful multimodal models from BAAI With detailed work on dataset curation, the Bunny-3B model built upon SigLIP and Phi-2 achieves performance on par with 13B models.