-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 110 -
Large Language Models Cannot Self-Correct Reasoning Yet
Paper • 2310.01798 • Published • 36 -
Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 29 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
Omar Sanseviero
osanseviero
AI & ML interests
Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.🦙
Organizations
MoEs papers reading list
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 6 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 6 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
OS Week Highlights - Oct 16 - 22
OS Week Highlights - Oct 2 - 8
OS Week Highlights - Sept 18 - 24
-
Running on Zero5.22k5.22k
IllusionDiffusion
👁Generate stunning high quality illusion artwork
-
Running on T42.7k2.7k
XTTS
🐸Create personalized speech using text and audio samples
-
Running7171
Nougat Transformers
🍫Convert PDFs to markup language using OCR
-
monster-labs/control_v1p_sd15_qrcode_monster
Updated • 63.7k • 1.41k
Mistral Instruct Merges
Merge of Mistral Instruct 1 and 2 using different mergekit techniques
Instruction Pre-Training
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 95 -
Running on Zero8686
Instruction Synthesizer
🐠Generate instruction-response pairs from text
-
instruction-pretrain/InstructLM-1.3B
Text Generation • 1B • Updated • 19 • 42 -
instruction-pretrain/InstructLM-500M
Text Generation • 0.6B • Updated • 37 • 34
Model Merging
Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it!
-
Qualitatively characterizing neural network optimization problems
Paper • 1412.6544 • Published • 4 -
Convergent Learning: Do different neural networks learn the same representations?
Paper • 1511.07543 • Published • 2 -
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Paper • 1909.11299 • Published • 2 -
Model Fusion via Optimal Transport
Paper • 1910.05653 • Published • 1
ML for Tools
Collection of papers about ML for using tools!
-
Internet-Augmented Dialogue Generation
Paper • 2107.07566 • Published • 2 -
Multi-hop Question Answering via Reasoning Chains
Paper • 1910.02610 • Published • 2 -
LaMDA: Language Models for Dialog Applications
Paper • 2201.08239 • Published • 4 -
WebGPT: Browser-assisted question-answering with human feedback
Paper • 2112.09332 • Published • 2
OS Week Highlights - Oct 9 - 15
OS Week Highlights - Sept 25 - Oct 1
Historical - Spaces of the Week
All Spaces of the Week...from all weeks
Papers I want to read
Papers in my to-read list
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
Papers I've read
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 110 -
Large Language Models Cannot Self-Correct Reasoning Yet
Paper • 2310.01798 • Published • 36 -
Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 29 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
Model Merging
Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it!
-
Qualitatively characterizing neural network optimization problems
Paper • 1412.6544 • Published • 4 -
Convergent Learning: Do different neural networks learn the same representations?
Paper • 1511.07543 • Published • 2 -
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Paper • 1909.11299 • Published • 2 -
Model Fusion via Optimal Transport
Paper • 1910.05653 • Published • 1
MoEs papers reading list
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 6 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 6 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
ML for Tools
Collection of papers about ML for using tools!
-
Internet-Augmented Dialogue Generation
Paper • 2107.07566 • Published • 2 -
Multi-hop Question Answering via Reasoning Chains
Paper • 1910.02610 • Published • 2 -
LaMDA: Language Models for Dialog Applications
Paper • 2201.08239 • Published • 4 -
WebGPT: Browser-assisted question-answering with human feedback
Paper • 2112.09332 • Published • 2
OS Week Highlights - Oct 16 - 22
OS Week Highlights - Oct 9 - 15
OS Week Highlights - Oct 2 - 8
OS Week Highlights - Sept 25 - Oct 1
OS Week Highlights - Sept 18 - 24
-
Running on Zero5.22k5.22k
IllusionDiffusion
👁Generate stunning high quality illusion artwork
-
Running on T42.7k2.7k
XTTS
🐸Create personalized speech using text and audio samples
-
Running7171
Nougat Transformers
🍫Convert PDFs to markup language using OCR
-
monster-labs/control_v1p_sd15_qrcode_monster
Updated • 63.7k • 1.41k
Historical - Spaces of the Week
All Spaces of the Week...from all weeks
Mistral Instruct Merges
Merge of Mistral Instruct 1 and 2 using different mergekit techniques
Papers I want to read
Papers in my to-read list
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90
Instruction Pre-Training
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 95 -
Running on Zero8686
Instruction Synthesizer
🐠Generate instruction-response pairs from text
-
instruction-pretrain/InstructLM-1.3B
Text Generation • 1B • Updated • 19 • 42 -
instruction-pretrain/InstructLM-500M
Text Generation • 0.6B • Updated • 37 • 34