This is a quantization of the phi-4.

The phi-4 model is a cutting-edge open-source LLM developed using a diverse mix of synthetic datasets, curated public domain web content, and acquired academic resources, including books and Q&A datasets. This deliberate data selection ensures the training of compact yet highly capable models with an emphasis on quality and advanced reasoning. To further enhance its performance, phi-4 underwent a rigorous alignment process that included supervised fine-tuning and direct preference optimization, resulting in precise instruction adherence and robust safety measures.

Evaluations

This model provides an accuracy recovery of 99.68%.

English phi-4 phi-4-FP8-Dynamic (this)
Avg. 70.75 70.7
Arc 68.7 68.7
Hellaswag 72.8 72.7
French phi-4 phi-4-FP8-Dynamic (this)
Avg. 68.67 68.87
Arc 59.4 59.5
Hellaswag 72.0 72.0
MMLU 74.6 75.1
German phi-4 phi-4-FP8-Dynamic (this)
Avg. 68.73 68.33
Arc 60.2 60.0
Hellaswag 69.8 69.6
MMLU 76.2 75.4
Italian phi-4 phi-4-FP8-Dynamic (this)
Avg. 69.3 69.07
Arc 61.1 61.3
Hellaswag 73.1 72.5
MMLU 73.7 73.4
Spanish phi-4 phi-4-FP8-Dynamic (this)
Avg. 70.6 70.03
Arc 61.6 61
Hellaswag 75.3 74.6
MMLU 74.9 74.5

We did not check for data contamination. Evaluation was done using Eval. Harness with limit=1000.

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/phi-4-FP8-Dynamic --max-model-len 16384

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/phi-4-FP8-Dynamic",
        "prompt": "San Francisco is a"
    } '

⚡ This model is optimized to handle heavy workloads providing a total throughput of ️4623 tokens per second using one NVIDIA L40S ⚡

Downloads last month
9,881
Safetensors
Model size
14.7B params
Tensor type
BF16
·
F8_E4M3
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for cortecs/phi-4-FP8-Dynamic

Base model

microsoft/phi-4
Quantized
(71)
this model