llama-3.2-3b-it-grpo-250404-GGUF

ReZero-v0.1-llama-3.2-3b-it-grpo-250404 is a research project focused on enhancing the search abilities of small language models by training them to develop robust search strategies rather than memorizing static data. The model, built on a Llama-3.2-3B backbone, interacts with multiple synthetic search engines that each have unique retrieval mechanisms, enabling it to refine queries iteratively and persist in finding exact answers using reinforcement learning. The repository provides setup instructions, including environment configuration and dependency installation, as well as scripts to train the model or regenerate synthetic training data. Demonstrations can be run through a Gradio interface, and the release includes comprehensive experiment logs on reward strategies and search quality. The model and associated resources are open-source and accessible to the research community, with further details on experiments and references provided in the documentation.

Model Files

File name	Size	Quant Type
llama-3.2-3b-it-grpo-250404.F32.gguf	12.9 GB	F32
llama-3.2-3b-it-grpo-250404.BF16.gguf	6.43 GB	BF16
llama-3.2-3b-it-grpo-250404.F16.gguf	6.43 GB	F16
llama-3.2-3b-it-grpo-250404.Q8_0.gguf	3.42 GB	Q8_0
llama-3.2-3b-it-grpo-250404.Q6_K.gguf	2.64 GB	Q6_K
llama-3.2-3b-it-grpo-250404.Q5_K_M.gguf	2.32 GB	Q5_K_M
llama-3.2-3b-it-grpo-250404.Q5_K_S.gguf	2.27 GB	Q5_K_S
llama-3.2-3b-it-grpo-250404.Q4_K_M.gguf	2.02 GB	Q4_K_M
llama-3.2-3b-it-grpo-250404.Q4_K_S.gguf	1.93 GB	Q4_K_S
llama-3.2-3b-it-grpo-250404.Q3_K_L.gguf	1.82 GB	Q3_K_L
llama-3.2-3b-it-grpo-250404.Q3_K_M.gguf	1.69 GB	Q3_K_M
llama-3.2-3b-it-grpo-250404.Q3_K_S.gguf	1.54 GB	Q3_K_S
llama-3.2-3b-it-grpo-250404.Q2_K.gguf	1.36 GB	Q2_K

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

prithivMLmods
/

llama-3.2-3b-it-grpo-250404-GGUF

llama-3.2-3b-it-grpo-250404-GGUF

Model Files

Quants Usage

Model tree for prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF