metadata
license: llama2
metrics:
- rouge
base_model:
- meta-llama/Llama-2-70b-chat-hf
Quark Team FP8 Llama-2-70b-chat-hf Model Overview
Model Information For MLPerf
- Model Name: Llama-2-70b-chat-hf
- Version: MLPerf v5.0
- Commit: Close Division Commit
Calibration Dataset
The calibration dataset consists of 1000 OpenOcra samples provided by MLPerf
Quantized Tensors
The following tensors are quantized in each decoder:
- MLP Layer inputs and weights
- Linear (including QKVO linear) layer inputs and weights
- KV Cache Entries
Ignored Layers
The following layers are ignored during quantization:
lm_head
Model Performance Comparison
| Metric | Baseline Accuracy Target (%) | FP8 Quant Accuracy (%) |
|---|---|---|
| Open Orca (Chat) | ||
| - Rouge1 | 44.4312 | 44.6369 |
| - Rouge2 | 22.0352 | 22.1798 |
| - RougeL | 28.6162 | 28.8249 |
License
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.