metadata

license: gpl-3.0
language:
  - en
datasets:
  - Mxode/Magpie-Pro-10K-GPT4o-mini
pipeline_tag: text2text-generation
tags:
  - chemistry
  - biology
  - finance
  - legal
  - music
  - code
  - climate
  - medical
  - text-generation-inference

NanoLM-0.3B-Instruct-v2

English | 简体中文

Introduction

In order to explore the potential of small models, I have attempted to build a series of them, which are available in the NanoLM Collections.

This is NanoLM-0.3B-Instruct-v2. The model currently supports English only.

Model Details

Nano LMs	Non-emb Params	Arch	Layers	Dim	Heads	Seq Len
25M	15M	MistralForCausalLM	12	312	12	2K
70M	42M	LlamaForCausalLM	12	576	9	2K
0.3B	180M	Qwen2ForCausalLM	12	896	14	4K
1B	840M	Qwen2ForCausalLM	18	1536	12	4K

The tokenizer and model architecture of NanoLM-0.3B-Instruct-v1.1 are the same as Qwen/Qwen2-0.5B, but the number of layers has been reduced from 24 to 12.

As a result, NanoLM-0.3B-Instruct-v1.1 has only 0.3 billion parameters, with approximately 180 million non-embedding parameters.

Despite this, NanoLM-0.3B-Instruct-v1.1 still demonstrates strong instruction-following capabilities.

How to use

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = 'Mxode/NanoLM-0.3B-Instruct-v2'

model = AutoModelForCausalLM.from_pretrained(model_path).to('cuda:0', torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_path)


def get_response(prompt: str, **kwargs):
    generation_args = dict(
        max_new_tokens = kwargs.pop("max_new_tokens", 512),
        do_sample = kwargs.pop("do_sample", True),
        temperature = kwargs.pop("temperature", 0.7),
        top_p = kwargs.pop("top_p", 0.8),
        top_k = kwargs.pop("top_k", 40),
        **kwargs
    )

    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

    generated_ids = model.generate(model_inputs.input_ids, **generation_args)
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response


prompt1 = "Calculate (4 - 1) * 7"
print(get_response(prompt1, do_sample=False))

"""
To calculate the expression (4 - 1) * 7, we need to follow the order of operations (PEMDAS):

1. Evaluate the expression inside the parentheses: 4 - 1 = 3
2. Multiply 3 by 7: 3 * 7 = 21

So, (4 - 1) * 7 = 21.
"""