llama-3.2-3b-it-grpo-250404-GGUF
ReZero-v0.1-llama-3.2-3b-it-grpo-250404 is a research project focused on enhancing the search abilities of small language models by training them to develop robust search strategies rather than memorizing static data. The model, built on a Llama-3.2-3B backbone, interacts with multiple synthetic search engines that each have unique retrieval mechanisms, enabling it to refine queries iteratively and persist in finding exact answers using reinforcement learning. The repository provides setup instructions, including environment configuration and dependency installation, as well as scripts to train the model or regenerate synthetic training data. Demonstrations can be run through a Gradio interface, and the release includes comprehensive experiment logs on reward strategies and search quality. The model and associated resources are open-source and accessible to the research community, with further details on experiments and references provided in the documentation.
Model Files
File name | Size | Quant Type |
---|---|---|
llama-3.2-3b-it-grpo-250404.F32.gguf | 12.9 GB | F32 |
llama-3.2-3b-it-grpo-250404.BF16.gguf | 6.43 GB | BF16 |
llama-3.2-3b-it-grpo-250404.F16.gguf | 6.43 GB | F16 |
llama-3.2-3b-it-grpo-250404.Q8_0.gguf | 3.42 GB | Q8_0 |
llama-3.2-3b-it-grpo-250404.Q6_K.gguf | 2.64 GB | Q6_K |
llama-3.2-3b-it-grpo-250404.Q5_K_M.gguf | 2.32 GB | Q5_K_M |
llama-3.2-3b-it-grpo-250404.Q5_K_S.gguf | 2.27 GB | Q5_K_S |
llama-3.2-3b-it-grpo-250404.Q4_K_M.gguf | 2.02 GB | Q4_K_M |
llama-3.2-3b-it-grpo-250404.Q4_K_S.gguf | 1.93 GB | Q4_K_S |
llama-3.2-3b-it-grpo-250404.Q3_K_L.gguf | 1.82 GB | Q3_K_L |
llama-3.2-3b-it-grpo-250404.Q3_K_M.gguf | 1.69 GB | Q3_K_M |
llama-3.2-3b-it-grpo-250404.Q3_K_S.gguf | 1.54 GB | Q3_K_S |
llama-3.2-3b-it-grpo-250404.Q2_K.gguf | 1.36 GB | Q2_K |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 108
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
32-bit
Model tree for prithivMLmods/llama-3.2-3b-it-grpo-250404-GGUF
Base model
meta-llama/Llama-3.2-3B-Instruct