zk67
zk67
AI & ML interests
None yet
Recent Activity
liked
a model
29 days ago
deepseek-ai/DeepSeek-R1-0528
liked
a model
3 months ago
deepseek-ai/DeepSeek-V3-0324
liked
a model
3 months ago
lmstudio-community/DeepSeek-V3-0324-GGUF
Organizations
Model Architecture
Agent AI
LLM Data
-
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development
Paper • 2407.11784 • Published • 4 -
Data Management For Large Language Models: A Survey
Paper • 2312.01700 • Published -
Datasets for Large Language Models: A Comprehensive Survey
Paper • 2402.18041 • Published • 2
Ilya Papers
LLM Tech Report
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 368 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 148 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 193
LLM Pre-Train
-
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 7 -
Scaling Laws for Autoregressive Generative Modeling
Paper • 2010.14701 • Published -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 10 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4
Foundation Models and AGI
Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches
https://arxiv.org/abs/2501.03151
Instruction Tuning
-
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models
Paper • 2307.06290 • Published • 10 -
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Paper • 2408.02085 • Published • 19 -
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Paper • 2403.14608 • Published
Training
-
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Paper • 2102.06356 • Published -
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Paper • 1904.00962 • Published • 1 -
Decoupled Weight Decay Regularization
Paper • 1711.05101 • Published • 2
inference optimization
-
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 13 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper • 2307.08691 • Published • 8 -
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Paper • 2407.08608 • Published • 1 -
Fast Transformer Decoding: One Write-Head is All You Need
Paper • 1911.02150 • Published • 6
LLM Reasoning Papers
improve reasoning capabilities of LLMs
-
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
LLM Critics Help Catch LLM Bugs
Paper • 2407.00215 • Published -
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper • 2407.21787 • Published • 13 -
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13
LLM Post Training
-
Instruction Tuning for Large Language Models: A Survey
Paper • 2308.10792 • Published • 1 -
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Paper • 2403.14608 • Published -
Efficient Large Language Models: A Survey
Paper • 2312.03863 • Published • 4 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 32
LLM Evaluation
Foundation Models and AGI
Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches
https://arxiv.org/abs/2501.03151
Model Architecture
Instruction Tuning
-
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models
Paper • 2307.06290 • Published • 10 -
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Paper • 2408.02085 • Published • 19 -
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Paper • 2403.14608 • Published
Agent AI
Training
-
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Paper • 2102.06356 • Published -
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Paper • 1904.00962 • Published • 1 -
Decoupled Weight Decay Regularization
Paper • 1711.05101 • Published • 2
LLM Data
-
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development
Paper • 2407.11784 • Published • 4 -
Data Management For Large Language Models: A Survey
Paper • 2312.01700 • Published -
Datasets for Large Language Models: A Comprehensive Survey
Paper • 2402.18041 • Published • 2
inference optimization
-
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 13 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper • 2307.08691 • Published • 8 -
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Paper • 2407.08608 • Published • 1 -
Fast Transformer Decoding: One Write-Head is All You Need
Paper • 1911.02150 • Published • 6
Ilya Papers
LLM Reasoning Papers
improve reasoning capabilities of LLMs
-
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
LLM Critics Help Catch LLM Bugs
Paper • 2407.00215 • Published -
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Paper • 2407.21787 • Published • 13 -
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13
LLM Tech Report
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 368 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 148 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 193
LLM Post Training
-
Instruction Tuning for Large Language Models: A Survey
Paper • 2308.10792 • Published • 1 -
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Paper • 2403.14608 • Published -
Efficient Large Language Models: A Survey
Paper • 2312.03863 • Published • 4 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 32
LLM Pre-Train
-
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 7 -
Scaling Laws for Autoregressive Generative Modeling
Paper • 2010.14701 • Published -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 10 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4