Typo in Sequence Parallelism TO -> TP
#106
by
JulienVig
- opened
Thank you so much for the great blogpost.
In "Here again, like vanilla TO, TP+SP is usually done only within a node (keeping the TP degree under the number of GPU per nodes", I assume "TO" is supposed to be "TP".