Demo
Summarize text and files in real-time chat
kind of wondering about the following statement
achieving a throughput of 15 generations per hour per H100
Since DeepSeek-R1 can't fit into a single H100 (and based on Update #2, the model fits into 8xH100), how do you measure the throughput of H100? maybe 15*8 = 120 by 8xH100?
somewhat ambiguous point:
But, this article claims as below:
By combining rule-based verification (Math Verify) with LLM-based evaluation, we improve dataset quality while maintaining scale. The final dataset consists of 220k problems with verified reasoning traces, making it a valuable resource for training reasoning models
I think the size should be 248k? otherwise, it seems like the LLM based evaluation hasn't been included in the final dataset.