matlok
's Collections
Papers - Training
updated
SELF: Language-Driven Self-Evolution for Large Language Model
Paper
•
2310.00533
•
Published
•
2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing
Training Length
Paper
•
2310.00576
•
Published
•
2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data
Age, Domain Coverage, Quality, & Toxicity
Paper
•
2305.13169
•
Published
•
3
Transformers Can Achieve Length Generalization But Not Robustly
Paper
•
2402.09371
•
Published
•
12
Triple-Encoders: Representations That Fire Together, Wire Together
Paper
•
2402.12332
•
Published
•
2
Veagle: Advancements in Multimodal Representation Learning
Paper
•
2403.08773
•
Published
•
7
Training Compute-Optimal Large Language Models
Paper
•
2203.15556
•
Published
•
10
Hash Layers For Large Sparse Models
Paper
•
2106.04426
•
Published
•
2
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
•
2309.11495
•
Published
•
38
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
•
2309.09117
•
Published
•
37
Paper
•
2407.10671
•
Published
•
155
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper
•
2404.05405
•
Published
•
9
Scaling Laws for Precision
Paper
•
2411.04330
•
Published
•
6
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Paper
•
1806.07572
•
Published
•
1