|
|
--- |
|
|
license: llama2 |
|
|
metrics: |
|
|
- rouge |
|
|
base_model: |
|
|
- meta-llama/Llama-2-70b-chat-hf |
|
|
--- |
|
|
# Quark Team FP8 Llama-2-70b-chat-hf Model Overview |
|
|
|
|
|
## Model Information For MLPerf |
|
|
- **Model Name**: Llama-2-70b-chat-hf |
|
|
- **Version**: MLPerf v5.0 |
|
|
- **Commit**: Close Division Commit |
|
|
|
|
|
## Calibration Dataset |
|
|
The calibration dataset consists of **1000 OpenOcra samples** provided by MLPerf |
|
|
|
|
|
## Quantized Tensors |
|
|
The following tensors are quantized in each decoder: |
|
|
|
|
|
- **MLP Layer inputs and weights** |
|
|
- **Linear (including QKVO linear) layer inputs and weights** |
|
|
- **KV Cache Entries** |
|
|
|
|
|
## Ignored Layers |
|
|
The following layers are ignored during quantization: |
|
|
- `lm_head` |
|
|
|
|
|
# Model Performance Comparison |
|
|
|
|
|
| Metric | Baseline Accuracy Target (%) | FP8 Quant Accuracy (%) | |
|
|
|-----------------------|--------------------|-----------------------| |
|
|
| **Open Orca (Chat)** | | | |
|
|
| - Rouge1 | 44.4312 | 44.6369 | |
|
|
| - Rouge2 | 22.0352 | 22.1798 | |
|
|
| - RougeL | 28.6162 | 28.8249 | |
|
|
|
|
|
#### License |
|
|
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved. |