vultr
/

Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32

Text Generation

text-generation-inference

Model card Files Files and versions

nathangoulding commited on Jan 28

Commit

9433a39

·

verified ·

1 Parent(s): 761e6de

Create README.md

Files changed (1) hide show

README.md +36 -0

README.md ADDED Viewed

	@@ -0,0 +1,36 @@

+# Model Information
+The `vultr/Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32` model is a quantized version `Meta-Llama-3.1-70B-Instruct` that was dequantized from HuggingFace's AWS Int4 model and requantized and optimized to run on AMD GPUs. It is a drop-in replacement for [hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4).
+```
+Throughput: 68.74 requests/s, 43994.71 total tokens/s, 8798.94 output tokens/s
+```
+## Model Details
+### Model Description
+- **Developed by:** Meta
+- **Model type:** Quantized Large Language Model
+- **Language(s) (NLP):** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
+- **License:** Llama 3.1
+- **Dequantized From:** [hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)
+## Technical Specifications [optional]
+### Compute Infrastructure
+- Vultr
+#### Hardware
+- AMD MI300X
+#### Software
+- ROCm
+## Model Card Authors [optional]
+- [biondizzle](https://huggingface.co/biondizzle)