Llama-PLLuM-8B-instruct-ArtexIT-reasoning
Built with Llama
This repository contains a GRPO fine‑tune of [CYFRAGOVPL/Llama-PLLuM-8B-instruct
] trained on GSM8K (MIT).
We publish both Hugging Face (safetensors) and GGUF artifacts (Q8_0, Q5_K_M) for use with llama.cpp
.
What is this?
- Base: Meta Llama 3.1 → PLLuM 8B Instruct (Polish) → GRPO fine‑tune (math / word problems).
- Context: ~131k (based on GGUF header).
- Message format: Llama
[INST] ... [/INST]
+ explicit reasoning / answer tags (see below). - Default chat template: The tokenizer includes a default system instruction enforcing the two‑block format.
Prompt format
The model expects Llama chat formatting and supports explicit tags:
- Reasoning:
<think> ... </think>
- Final answer:
<answer> ... </answer>
Example
[INST] Rozwiąż: 12 * 13 = ? [/INST]
<think>12*13 = 156.</think>
<answer>156</answer>
Quickstart
Transformers (PyTorch)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning"
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="auto", device_map="auto")
prompt = tok.apply_chat_template(
[{"role": "user", "content": "Podaj 3 miasta w Polsce."}],
add_generation_prompt=True,
tokenize=False,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=64)
print(tok.decode(out[0], skip_special_tokens=False))
Training (brief)
- Method: GRPO (policy‑gradient reinforcement learning with multiple reward functions).
- Data:
openai/gsm8k
— License: MIT. - Goal: consistent two‑block outputs (reasoning + final answer) using the training tags.
License & Attribution
This repository contains derivatives of Llama 3.1 and PLLuM:
- Llama 3.1 Community License applies. When redistributing, you must:
- include a copy of the license and prominently display “Built with Llama”,
- include “Llama” at the beginning of any distributed model’s name if it was created, trained or fine‑tuned using Llama materials,
- keep a NOTICE file with the following line:
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
- comply with the Acceptable Use Policy (AUP).
- PLLuM: please cite the PLLuM work (see Citation below).
- Data: GSM8K is MIT‑licensed; include dataset attribution.
This repo includes:
LICENSE
— full text of the Llama 3.1 Community LicenseUSE_POLICY.md
— pointer to the official Acceptable Use PolicyNOTICE
— required Llama attribution line
If your (or your affiliates’) products exceeded 700M monthly active users on the Llama 3.1 release date, you must obtain a separate license from Meta before exercising the rights in the Llama 3.1 license.
Citation
If you use PLLuM in research or deployments, please cite:
@unpublished{pllum2025,
title={PLLuM: A Family of Polish Large Language Models},
author={PLLuM Consortium},
year={2025}
}
- Downloads last month
- 52
Hardware compatibility
Log In
to view the estimation
5-bit
8-bit
Model tree for ARTEXIT/Llama-PLLuM-8B-instruct-ArtexIT-reasoning-GGUF
Base model
CYFRAGOVPL/Llama-PLLuM-8B-instruct