llama-3.2-3b-it-grpo-250404-GGUF

ReZero-v0.1-llama-3.2-3b-it-grpo-250404 is a research project focused on enhancing the search abilities of small language models by training them to develop robust search strategies rather than memorizing static data. The model, built on a Llama-3.2-3B backbone, interacts with multiple synthetic search engines that each have unique retrieval mechanisms, enabling it to refine queries iteratively and persist in finding exact answers using reinforcement learning. The repository provides setup instructions, including environment configuration and dependency installation, as well as scripts to train the model or regenerate synthetic training data. Demonstrations can be run through a Gradio interface, and the release includes comprehensive experiment logs on reward strategies and search quality. The model and associated resources are open-source and accessible to the research community, with further details on experiments and references provided in the documentation.

Model Files

File name Size Quant Type
llama-3.2-3b-it-grpo-250404.F32.gguf 12.9 GB F32
llama-3.2-3b-it-grpo-250404.BF16.gguf 6.43 GB BF16
llama-3.2-3b-it-grpo-250404.F16.gguf 6.43 GB F16
llama-3.2-3b-it-grpo-250404.Q8_0.gguf 3.42 GB Q8_0
llama-3.2-3b-it-grpo-250404.Q6_K.gguf 2.64 GB Q6_K
llama-3.2-3b-it-grpo-250404.Q5_K_M.gguf 2.32 GB Q5_K_M
llama-3.2-3b-it-grpo-250404.Q5_K_S.gguf 2.27 GB Q5_K_S
llama-3.2-3b-it-grpo-250404.Q4_K_M.gguf 2.02 GB Q4_K_M
llama-3.2-3b-it-grpo-250404.Q4_K_S.gguf 1.93 GB Q4_K_S
llama-3.2-3b-it-grpo-250404.Q3_K_L.gguf 1.82 GB Q3_K_L
llama-3.2-3b-it-grpo-250404.Q3_K_M.gguf 1.69 GB Q3_K_M
llama-3.2-3b-it-grpo-250404.Q3_K_S.gguf 1.54 GB Q3_K_S
llama-3.2-3b-it-grpo-250404.Q2_K.gguf 1.36 GB Q2_K

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
108
GGUF
Model size
3.21B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF