bowenbaoamd's picture
Update README.md
ee33201 verified
|
raw
history blame
1.25 kB
---
license: llama2
metrics:
- rouge
base_model:
- meta-llama/Llama-2-70b-chat-hf
---
# Quark Team FP8 Llama-2-70b-chat-hf Model Overview
## Model Information For MLPerf
- **Model Name**: Llama-2-70b-chat-hf
- **Version**: MLPerf v5.0
- **Commit**: Close Division Commit
## Calibration Dataset
The calibration dataset consists of **1000 OpenOcra samples** provided by MLPerf
## Quantized Tensors
The following tensors are quantized in each decoder:
- **MLP Layer inputs and weights**
- **Linear (including QKVO linear) layer inputs and weights**
- **KV Cache Entries**
## Ignored Layers
The following layers are ignored during quantization:
- `lm_head`
# Model Performance Comparison
| Metric | Baseline Accuracy Target (%) | FP8 Quant Accuracy (%) |
|-----------------------|--------------------|-----------------------|
| **Open Orca (Chat)** | | |
| - Rouge1 | 44.4312 | 44.6369 |
| - Rouge2 | 22.0352 | 22.1798 |
| - RougeL | 28.6162 | 28.8249 |
#### License
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.