Questions?
pinned
1
4
#84 opened 2 months ago
by
nouamanetazi

More ressources
pinned
5
#73 opened 2 months ago
by
eliebak

How to understand the graph "Tensor parallelism with column linear + row Linear"
#109 opened about 1 hour ago
by
Yihel
Incorrect link in Data Parallelism?
#108 opened 4 days ago
by
joaogante

Thoughts on adding Hybrid Sharded Data Parallel to the guide
#107 opened 10 days ago
by
mattmcclean
Typo in Sequence Parallelism TO -> TP
#106 opened 12 days ago
by
JulienVig
Wrong section title for FSDP?
#105 opened about 1 month ago
by
amitness

A mistake ? Weights/grads/optimizer stats memory for mixed precision
#104 opened about 1 month ago
by
donglongfei
Questions about pipeline parallelism
2
#103 opened about 2 months ago
by
ink0215
update
#102 opened about 2 months ago
by
nouamanetazi

Widget does not take TP into account for Parameter / Gradient / Optimizer State Sharding
#98 opened about 2 months ago
by
Turakar

Am I misunderstanding Zero-1 and Zero-2?
4
#94 opened about 2 months ago
by
Guanghua
Fix description of Zero-1
1
#93 opened about 2 months ago
by
Guanghua
Few Errors
2
3
#86 opened 2 months ago
by
gordicaleksa

How can the following figure be obtained, and is there a way to tag the name of each tensor during profiling?
1
#83 opened 2 months ago
by
ll922

dark-mode
1
10
#82 opened 2 months ago
by
serhany

Thanks for sharing. Was looking for similar research to get to know about compute(AI+GPU)
4
1
#79 opened 2 months ago
by
pknayak
Make it easier to import into reader applications
7
#77 opened 2 months ago
by
pascalwhoop