|
|
--- |
|
|
language: |
|
|
- en |
|
|
- de |
|
|
- fr |
|
|
- it |
|
|
- pt |
|
|
- hi |
|
|
- es |
|
|
- th |
|
|
license: llama3.1 |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
base_model: meta-llama/Meta-Llama-3.1-70B |
|
|
tags: |
|
|
- pytorch |
|
|
- llama |
|
|
- llama-3 |
|
|
- vultr |
|
|
--- |
|
|
|
|
|
# Model Information |
|
|
|
|
|
The `vultr/Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32` model is a quantized version `Meta-Llama-3.1-70B-Instruct` that was dequantized from HuggingFace's AWS Int4 model and requantized and optimized to run on AMD GPUs. It is a drop-in replacement for [hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4). |
|
|
|
|
|
``` |
|
|
Throughput: 68.74 requests/s, 43994.71 total tokens/s, 8798.94 output tokens/s |
|
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Developed by:** Meta |
|
|
- **Model type:** Quantized Large Language Model |
|
|
- **Language(s) (NLP):** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. |
|
|
- **License:** Llama 3.1 |
|
|
- **Dequantized From:** [hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4) |
|
|
|
|
|
## Technical Specifications [optional] |
|
|
|
|
|
### Compute Infrastructure |
|
|
|
|
|
- Vultr |
|
|
|
|
|
#### Hardware |
|
|
|
|
|
- AMD MI300X |
|
|
|
|
|
#### Software |
|
|
|
|
|
- ROCm |
|
|
|
|
|
## Model Author |
|
|
|
|
|
- [biondizzle](https://huggingface.co/biondizzle) |
|
|
|