Can you provide Machine Specs

by kingabzpro - opened about 1 month ago

Discussion

kingabzpro

about 1 month ago

How many H100s are required to run this model locally and other parameters for hardware optimization.

aaron-newsome

about 1 month ago

From the deployment guide:

The smallest deployment unit for Kimi-K2 FP8 weights with 128k seqlen on mainstream H200 or H20 platform is a cluster with 16 GPUs with either Tensor Parallel (TP) or "data parallel + expert parallel" (DP+EP).

https://github.com/MoonshotAI/Kimi-K2/blob/main/docs/deploy_guidance.md

lsw825

Moonshot AI org about 1 month ago

The number of H100s needed at least is 16 with very short sequence length (only for simple testing). For a normal experience, 32 H100s are required.

vpakarinen

30 days ago

If someone can actually test this model, tell me if its good.

xxr3376

Moonshot AI org 30 days ago

@vpakarinen it's really good, you should try it!

halldorj

30 days ago

The number of H100s needed at least is 16 with very short sequence length (only for simple testing). For a normal experience, 32 H100s are required.

Can you provide an sglang example with 32 H100s? :)

lsw825

Moonshot AI org 29 days ago

Can you provide an sglang example with 32 H100s? :)

In SGLang, the way we recommend to deploy K2 is to use P-D-Disaggregation with DP+EP. It needs 2 prefilling nodes and 4 decoding nodes at least. In our simple testing, only using 32 H100s DP+EP deployment without P-D-Disaggregation has some problems (probably I'm wrong). I think you can also ask for suggestions in SGLang community.

ersintarhan

29 days ago

Can I Deploy this setup to. 4 Node that each have RTX4000 Ada + 64GB Ram + 10Gbps Network ultra low latency?

io-taas

28 days ago

This comment has been hidden (marked as Resolved)

kingabzpro

28 days ago

Can I Deploy this setup to. 4 Node that each have RTX4000 Ada + 64GB Ram + 10Gbps Network ultra low latency?

I dont think so. Wait for the quantized version of the model.

shujian2025

22 days ago

Can you provide an sglang example with 32 H100s? :)

In SGLang, the way we recommend to deploy K2 is to use P-D-Disaggregation with DP+EP. It needs 2 prefilling nodes and 4 decoding nodes at least. In our simple testing, only using 32 H100s DP+EP deployment without P-D-Disaggregation has some problems (probably I'm wrong). I think you can also ask for suggestions in SGLang community.

Would you recommend other packages for inference with H100 nodes?

berageo

13 days ago

How many tokens per minute can this recommended minimum setup process, approximately? (H200 16x )

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment