Deploy gpt-oss models in your own AWS account using vLLM and Tensorfuse
#36
by
agam30
- opened
Hi all,
we have released a guide to deploy openai's latest oss models in your own AWS account. What's included:
- Optimized dockerfile with the latest vllm-openai:gptoss for both 20b and 120b models
- we achieved 240 tps on 1xH100 with 20b model and 200tps on 8xH100 with 120b model
- Served with full context length of 130k
Follow the guide to run it in your AWS account: https://tensorfuse.io/docs/guides/modality/text/openai_oss
get started with tensorfuse here: https://app.tensorfuse.io/
Would be awesome to release metrics on Consumer hardware.
Hey @agam30 thanks for writing the guide! Feel free to PR it in to add it here https://github.com/openai/gpt-oss/blob/main/awesome-gpt-oss.md
thanks for reaching out!
just added the PR: https://github.com/openai/gpt-oss/pull/118
Can you get it approved!