bowenbaoamd's picture
Update README.md
ee33201 verified
metadata
license: llama2
metrics:
  - rouge
base_model:
  - meta-llama/Llama-2-70b-chat-hf

Quark Team FP8 Llama-2-70b-chat-hf Model Overview

Model Information For MLPerf

  • Model Name: Llama-2-70b-chat-hf
  • Version: MLPerf v5.0
  • Commit: Close Division Commit

Calibration Dataset

The calibration dataset consists of 1000 OpenOcra samples provided by MLPerf

Quantized Tensors

The following tensors are quantized in each decoder:

  • MLP Layer inputs and weights
  • Linear (including QKVO linear) layer inputs and weights
  • KV Cache Entries

Ignored Layers

The following layers are ignored during quantization:

  • lm_head

Model Performance Comparison

Metric Baseline Accuracy Target (%) FP8 Quant Accuracy (%)
Open Orca (Chat)
- Rouge1 44.4312 44.6369
- Rouge2 22.0352 22.1798
- RougeL 28.6162 28.8249

License

Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.