Introduction

For model inference, please download our release package from this url https://github.com/im0qianqian/llama.cpp/releases .

Quick start

# Use a local model file
llama-cli -m my_model.gguf

# Launch OpenAI-compatible API server
llama-server -m my_model.gguf

Let's look forward to the following PR being merged:

GGUF

Model size

16.3B params

Architecture

bailingmoe2

Hardware compatibility

2-bit

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Quantized

(18)

this model