sugatoray
's Collections
Papers-Fundamentals
updated
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper
•
2104.09864
•
Published
•
13
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
59
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
62
Zero-Shot Tokenizer Transfer
Paper
•
2405.07883
•
Published
•
5
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
•
2401.02994
•
Published
•
51
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper
•
2406.06608
•
Published
•
63
Extreme Compression of Large Language Models via Additive Quantization
Paper
•
2401.06118
•
Published
•
13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
•
2402.03300
•
Published
•
115
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full
Context Interaction
Paper
•
2401.17948
•
Published
•
4
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Paper
•
2405.20233
•
Published
•
6
Stream of Search (SoS): Learning to Search in Language
Paper
•
2404.03683
•
Published
•
32
Xmodel-2 Technical Report
Paper
•
2412.19638
•
Published
•
27
Transformer^2: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
•
55
Foundations of Large Language Models
Paper
•
2501.09223
•
Published
•
3
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
381
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
•
2502.01534
•
Published
•
40
Levels of AGI: Operationalizing Progress on the Path to AGI
Paper
•
2311.02462
•
Published
•
37
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
112
A Survey on Post-training of Large Language Models
Paper
•
2503.06072
•
Published
•
4
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
•
2503.09573
•
Published
•
68
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
155
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
•
2503.21460
•
Published
•
73
rasbt/llama-3.2-from-scratch
Updated
•
254