kaizuberbuehler
's Collections
LM Capabilities and Scaling
updated
Compression Represents Intelligence Linearly
Paper
•
2404.09937
•
Published
•
28
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
•
2404.06395
•
Published
•
23
Long-context LLMs Struggle with Long In-context Learning
Paper
•
2404.02060
•
Published
•
38
Are large language models superhuman chemists?
Paper
•
2404.01475
•
Published
•
19
FlowMind: Automatic Workflow Generation with LLMs
Paper
•
2404.13050
•
Published
•
35
Capabilities of Gemini Models in Medicine
Paper
•
2404.18416
•
Published
•
25
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper
•
2405.12107
•
Published
•
30
On the Planning Abilities of Large Language Models -- A Critical
Investigation
Paper
•
2305.15771
•
Published
•
1
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Paper
•
2406.09170
•
Published
•
28
MuirBench: A Comprehensive Benchmark for Robust Multi-image
Understanding
Paper
•
2406.09411
•
Published
•
20
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo
Tree Self-refine with LLaMa-3 8B
Paper
•
2406.07394
•
Published
•
29
GEB-1.3B: Open Lightweight Large Language Model
Paper
•
2406.09900
•
Published
•
21
Mixture of A Million Experts
Paper
•
2407.04153
•
Published
•
5
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper
•
2404.05405
•
Published
•
10
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
•
2408.06195
•
Published
•
73
Attention Heads of Large Language Models: A Survey
Paper
•
2409.03752
•
Published
•
90
HelloBench: Evaluating Long Text Generation Capabilities of Large
Language Models
Paper
•
2409.16191
•
Published
•
43
Making Text Embedders Few-Shot Learners
Paper
•
2409.15700
•
Published
•
31
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from
Disparate Training Data
Paper
•
2406.14546
•
Published
•
2
Are Your LLMs Capable of Stable Reasoning?
Paper
•
2412.13147
•
Published
•
95
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
•
2501.01257
•
Published
•
53
ProgCo: Program Helps Self-Correction of Large Language Models
Paper
•
2501.01264
•
Published
•
27
Paper
•
2412.04315
•
Published
•
19
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for
Quantized LLMs with 100T Training Tokens
Paper
•
2411.17691
•
Published
•
13
PokerBench: Training Large Language Models to become Professional Poker
Players
Paper
•
2501.08328
•
Published
•
17
Do generative video models learn physical principles from watching
videos?
Paper
•
2501.09038
•
Published
•
35
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for
Mixture-of-Experts Language Models
Paper
•
2501.12370
•
Published
•
11
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper
•
2501.16975
•
Published
•
31
Large Language Models Think Too Fast To Explore Effectively
Paper
•
2501.18009
•
Published
•
24
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of
Physical Concept Understanding
Paper
•
2502.08946
•
Published
•
195
Scaling Embedding Layers in Language Models
Paper
•
2502.01637
•
Published
•
24
Great Models Think Alike and this Undermines AI Oversight
Paper
•
2502.04313
•
Published
•
34
Scaling Pre-training to One Hundred Billion Data for Vision Language
Models
Paper
•
2502.07617
•
Published
•
29
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Paper
•
2502.06857
•
Published
•
25
Distillation Scaling Laws
Paper
•
2502.08606
•
Published
•
48
NoLiMa: Long-Context Evaluation Beyond Literal Matching
Paper
•
2502.05167
•
Published
•
15
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the
Limits of Embedding Space Capacity
Paper
•
2502.13063
•
Published
•
70
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based
Perspective
Paper
•
2502.17262
•
Published
•
20
Gemini Robotics: Bringing AI into the Physical World
Paper
•
2503.20020
•
Published
•
25
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Paper
•
2503.07604
•
Published
•
22
Shifting Long-Context LLMs Research from Input to Output
Paper
•
2503.04723
•
Published
•
22
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Paper
•
2503.04872
•
Published
•
15