Quark Quantized OCP FP8 Models
Collection
27 items
•
Updated
The calibration dataset consists of 1000 OpenOcra samples provided by MLPerf
The following tensors are quantized in each decoder:
The following layers are ignored during quantization:
lm_head
Metric | Baseline Accuracy Target (%) | FP8 Quant Accuracy (%) |
---|---|---|
Open Orca (Chat) | ||
- Rouge1 | 44.4312 | 44.6369 |
- Rouge2 | 22.0352 | 22.1798 |
- RougeL | 28.6162 | 28.8249 |
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
Base model
meta-llama/Llama-2-70b-chat-hf