Edit model card

QuantFactory Banner

QuantFactory/MiniCPM3-4B-GGUF

This is quantized version of openbmb/MiniCPM3-4B created using llama.cpp

Original Model Card

MiniCPM Repo | MiniCPM Paper | MiniCPM-V Repo | Join us in Discord and WeChat

Introduction

MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.

Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to Advanced Features for usage guidelines.

MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.

Usage

Inference with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

path = "openbmb/MiniCPM3-4B"
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)

messages = [
    {"role": "user", "content": "推荐5个北京的景点。"},
]
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)

model_outputs = model.generate(
    model_inputs,
    max_new_tokens=1024,
    top_p=0.7,
    temperature=0.7
)

output_token_ids = [
    model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
]

responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
print(responses)

Inference with vLLM

For now, you need to install our forked version of vLLM.

pip install git+https://github.com/OpenBMB/vllm.git@minicpm3
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "openbmb/MiniCPM3-4B"
prompt = [{"role": "user", "content": "推荐5个北京的景点。"}]

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)

llm = LLM(
    model=model_name,
    trust_remote_code=True,
    tensor_parallel_size=1
)
sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)

outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

Evaluation Results

Benchmark Qwen2-7B-Instruct GLM-4-9B-Chat Gemma2-9B-it Llama3.1-8B-Instruct GPT-3.5-Turbo-0125 Phi-3.5-mini-Instruct(3.8B) MiniCPM3-4B
English
MMLU 70.5 72.4 72.6 69.4 69.2 68.4 67.2
BBH 64.9 76.3 65.2 67.8 70.3 68.6 70.2
MT-Bench 8.41 8.35 7.88 8.28 8.17 8.60 8.41
IFEVAL (Prompt Strict-Acc.) 51.0 64.5 71.9 71.5 58.8 49.4 68.4
Chinese
CMMLU 80.9 71.5 59.5 55.8 54.5 46.9 73.3
CEVAL 77.2 75.6 56.7 55.2 52.8 46.1 73.6
AlignBench v1.1 7.10 6.61 7.10 5.68 5.82 5.73 6.74
FollowBench-zh (SSR) 63.0 56.4 57.0 50.6 64.6 58.1 66.8
Math
MATH 49.6 50.6 46.0 51.9 41.8 46.4 46.6
GSM8K 82.3 79.6 79.7 84.5 76.4 82.7 81.1
MathBench 63.4 59.4 45.8 54.3 48.9 54.9 65.6
Code
HumanEval+ 70.1 67.1 61.6 62.8 66.5 68.9 68.3
MBPP+ 57.1 62.2 64.3 55.3 71.4 55.8 63.2
LiveCodeBench v3 22.2 20.2 19.2 20.4 24.0 19.6 22.6
Function Call
BFCL v2 71.6 70.1 19.2 73.3 75.4 48.4 76.0
Overall
Average 65.3 65.0 57.9 60.8 61.0 57.2 66.3

Statement

  • As a language model, MiniCPM3-4B generates content by learning from a vast amount of text.
  • However, it does not possess the ability to comprehend or express personal opinions or value judgments.
  • Any content generated by MiniCPM3-4B does not represent the viewpoints or positions of the model developers.
  • Therefore, when using content generated by MiniCPM3-4B, users should take full responsibility for evaluating and verifying it on their own.

LICENSE

  • This repository is released under the Apache-2.0 License.
  • The usage of MiniCPM3-4B model weights must strictly follow MiniCPM Model License.md.
  • The models and weights of MiniCPM3-4B are completely free for academic research. after filling out a "questionnaire" for registration, are also available for free commercial use.

Citation

@article{hu2024minicpm,
  title={MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies},
  author={Hu, Shengding and Tu, Yuge and Han, Xu and He, Chaoqun and Cui, Ganqu and Long, Xiang and Zheng, Zhi and Fang, Yewei and Huang, Yuxiang and Zhao, Weilin and others},
  journal={arXiv preprint arXiv:2404.06395},
  year={2024}
}
Downloads last month
236
GGUF
Model size
4.07B params
Architecture
minicpm3

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.