nathangoulding commited on
Commit
9433a39
·
verified ·
1 Parent(s): 761e6de

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Model Information
3
+
4
+ The `vultr/Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32` model is a quantized version `Meta-Llama-3.1-70B-Instruct` that was dequantized from HuggingFace's AWS Int4 model and requantized and optimized to run on AMD GPUs. It is a drop-in replacement for [hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4).
5
+
6
+ ```
7
+ Throughput: 68.74 requests/s, 43994.71 total tokens/s, 8798.94 output tokens/s
8
+ ```
9
+
10
+ ## Model Details
11
+
12
+ ### Model Description
13
+
14
+ - **Developed by:** Meta
15
+ - **Model type:** Quantized Large Language Model
16
+ - **Language(s) (NLP):** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
17
+ - **License:** Llama 3.1
18
+ - **Dequantized From:** [hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)
19
+
20
+ ## Technical Specifications [optional]
21
+
22
+ ### Compute Infrastructure
23
+
24
+ - Vultr
25
+
26
+ #### Hardware
27
+
28
+ - AMD MI300X
29
+
30
+ #### Software
31
+
32
+ - ROCm
33
+
34
+ ## Model Card Authors [optional]
35
+
36
+ - [biondizzle](https://huggingface.co/biondizzle)