vultr
/

Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32

Text Generation

text-generation-inference

Model card Files Files and versions

Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32 / README.md

nathangoulding's picture

Update README.md

31b62b9 verified 10 months ago

|

1.33 kB

	---
	language:
	- en
	- de
	- fr
	- it
	- pt
	- hi
	- es
	- th
	license: llama3.1
	pipeline_tag: text-generation
	library_name: transformers
	base_model: meta-llama/Meta-Llama-3.1-70B
	tags:
	- pytorch
	- llama
	- llama-3
	- vultr
	---

	# Model Information

	The `vultr/Meta-Llama-3.1-70B-Instruct-AWQ-INT4-Dequantized-FP32` model is a quantized version `Meta-Llama-3.1-70B-Instruct` that was dequantized from HuggingFace's AWS Int4 model and requantized and optimized to run on AMD GPUs. It is a drop-in replacement for [hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4).

	```
	Throughput: 68.74 requests/s, 43994.71 total tokens/s, 8798.94 output tokens/s
	```

	## Model Details

	### Model Description

	- Developed by: Meta
	- Model type: Quantized Large Language Model
	- Language(s) (NLP): English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
	- License: Llama 3.1
	- Dequantized From: [hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)

	## Technical Specifications [optional]

	### Compute Infrastructure

	- Vultr

	#### Hardware

	- AMD MI300X

	#### Software

	- ROCm

	## Model Author

	- [biondizzle](https://huggingface.co/biondizzle)