ooibp
's Collections
LLM Papers
updated
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
44
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
•
1810.04805
•
Published
•
14
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
•
1910.01108
•
Published
•
14
Language Models are Few-Shot Learners
Paper
•
2005.14165
•
Published
•
11
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
•
2201.11903
•
Published
•
9
Training language models to follow instructions with human feedback
Paper
•
2203.02155
•
Published
•
16
PaLM: Scaling Language Modeling with Pathways
Paper
•
2204.02311
•
Published
•
2
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
•
2301.13688
•
Published
•
8
LLaMA: Open and Efficient Foundation Language Models
Paper
•
2302.13971
•
Published
•
13
Paper
•
2303.08774
•
Published
•
5
Paper
•
2305.10403
•
Published
•
6
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper
•
2305.10601
•
Published
•
10
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
242
Attention Is Not All You Need Anymore
Paper
•
2308.07661
•
Published
•
1
Paper
•
2310.06825
•
Published
•
47
Gemini: A Family of Highly Capable Multimodal Models
Paper
•
2312.11805
•
Published
•
45
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
•
2403.05530
•
Published
•
60
Gemma: Open Models Based on Gemini Research and Technology
Paper
•
2403.08295
•
Published
•
47
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
•
2404.14619
•
Published
•
124
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Paper
•
2407.01370
•
Published
•
85
OpenDevin: An Open Platform for AI Software Developers as Generalist
Agents
Paper
•
2407.16741
•
Published
•
68
The Llama 3 Herd of Models
Paper
•
2407.21783
•
Published
•
107
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
•
2408.06292
•
Published
•
115
Qwen2.5-Coder Technical Report
Paper
•
2409.12186
•
Published
•
135
Paper
•
2410.21276
•
Published
•
79