amd
/

Llama-2-70b-chat-hf_FP8_MLPerf_V2

Model card Files Files and versions

Llama-2-70b-chat-hf_FP8_MLPerf_V2 / README.md

bowenbaoamd's picture

Update README.md

ee33201 verified 8 months ago

|

1.25 kB

	---
	license: llama2
	metrics:
	- rouge
	base_model:
	- meta-llama/Llama-2-70b-chat-hf
	---
	# Quark Team FP8 Llama-2-70b-chat-hf Model Overview

	## Model Information For MLPerf
	- Model Name: Llama-2-70b-chat-hf
	- Version: MLPerf v5.0
	- Commit: Close Division Commit

	## Calibration Dataset
	The calibration dataset consists of 1000 OpenOcra samples provided by MLPerf

	## Quantized Tensors
	The following tensors are quantized in each decoder:

	- MLP Layer inputs and weights
	- Linear (including QKVO linear) layer inputs and weights
	- KV Cache Entries

	## Ignored Layers
	The following layers are ignored during quantization:
	- `lm_head`

	# Model Performance Comparison

	\| Metric \| Baseline Accuracy Target (%) \| FP8 Quant Accuracy (%) \|
	\|-----------------------\|--------------------\|-----------------------\|
	\| Open Orca (Chat) \| \| \|
	\| - Rouge1 \| 44.4312 \| 44.6369 \|
	\| - Rouge2 \| 22.0352 \| 22.1798 \|
	\| - RougeL \| 28.6162 \| 28.8249 \|

	#### License
	Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.