Post
1322
After today you NEED to know how Deepseek made its magic, check out this thread breaking down the paper: https://x.com/AlexBodner_/status/1883602267317927965
Yeah, thats why I did this calculator. It takes as input the model architecture, input shape, batch size and some other factors and it makes the tedious calculations for you!