Qwen3-1.7B-oga / README.md
SamTheDev's picture
Upload folder using huggingface_hub
df7608b verified
metadata
license: mit
base_model:
  - Qwen/Qwen3-1.7B
pipeline_tag: text-generation
tags:
  - onnx
  - onnxruntime-genai
  - oga

My Tests (Tesla P4)

  • CUDA int4: 2179 MiB, 6 TPS
  • CUDA fp16: 4221 MiB, 21 TPS
  • CUDA fp32: dnf (memory)