amd
/

Llama-2-70b-chat-hf_FP8_MLPerf_V2

Model card Files Files and versions

Llama-2-70b-chat-hf_FP8_MLPerf_V2 / README.md

bowenbaoamd's picture

Update README.md

ee33201 verified 7 months ago

|

history blame contribute delete

1.25 kB

metadata

license: llama2
metrics:
  - rouge
base_model:
  - meta-llama/Llama-2-70b-chat-hf

Quark Team FP8 Llama-2-70b-chat-hf Model Overview

Model Information For MLPerf

Model Name: Llama-2-70b-chat-hf
Version: MLPerf v5.0
Commit: Close Division Commit

Calibration Dataset

The calibration dataset consists of 1000 OpenOcra samples provided by MLPerf

Quantized Tensors

The following tensors are quantized in each decoder:

MLP Layer inputs and weights
Linear (including QKVO linear) layer inputs and weights
KV Cache Entries

Ignored Layers

The following layers are ignored during quantization:

lm_head

Model Performance Comparison

Metric	Baseline Accuracy Target (%)	FP8 Quant Accuracy (%)
Open Orca (Chat)
- Rouge1	44.4312	44.6369
- Rouge2	22.0352	22.1798
- RougeL	28.6162	28.8249

License

Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.