Deploy gpt-oss models in your own AWS account using vLLM and Tensorfuse

#36
by agam30 - opened

Hi all,

we have released a guide to deploy openai's latest oss models in your own AWS account. What's included:

  1. Optimized dockerfile with the latest vllm-openai:gptoss for both 20b and 120b models
  2. we achieved 240 tps on 1xH100 with 20b model and 200tps on 8xH100 with 120b model
  3. Served with full context length of 130k

Follow the guide to run it in your AWS account: https://tensorfuse.io/docs/guides/modality/text/openai_oss

get started with tensorfuse here: https://app.tensorfuse.io/

Would be awesome to release metrics on Consumer hardware.

Hey @agam30 thanks for writing the guide! Feel free to PR it in to add it here https://github.com/openai/gpt-oss/blob/main/awesome-gpt-oss.md

hi @dkundel-openai

thanks for reaching out!

just added the PR: https://github.com/openai/gpt-oss/pull/118

Can you get it approved!

Sign up or log in to comment