amd
/

Llama-2-70b-chat-hf_FP8_MLPerf_V2

Model card Files Files and versions

Quark Team FP8 Llama-2-70b-chat-hf Model Overview

Model Information For MLPerf

Model Name: Llama-2-70b-chat-hf
Version: MLPerf v5.0
Commit: Close Division Commit

Calibration Dataset

The calibration dataset consists of 1000 OpenOcra samples provided by MLPerf

Quantized Tensors

The following tensors are quantized in each decoder:

MLP Layer inputs and weights
Linear (including QKVO linear) layer inputs and weights
KV Cache Entries

Ignored Layers

The following layers are ignored during quantization:

lm_head

Model Performance Comparison

Metric	Baseline Accuracy Target (%)	FP8 Quant Accuracy (%)
Open Orca (Chat)
- Rouge1	44.4312	44.6369
- Rouge2	22.0352	22.1798
- RougeL	28.6162	28.8249

License

Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.

Downloads last month: 217

Safetensors

Model size

69B params

Tensor type

F16

·

F8_E4M3

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amd/Llama-2-70b-chat-hf_FP8_MLPerf_V2

Base model

meta-llama/Llama-2-70b-chat-hf

Quantized

(12)

this model

Collection including amd/Llama-2-70b-chat-hf_FP8_MLPerf_V2

Quark Quantized OCP FP8 Models

27 items • Updated Jun 16 • 3