PleIAs
/

Pleias-RAG-1B

Model card Files Files and versions Community

Pclanglais commited on 3 days ago

Commit

dcb06a7

·

verified ·

1 Parent(s): 2afe4c2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -87,6 +87,6 @@ The easiest way to deploy Pleias-RAG-1B is through [our official library](https:
 With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
-We also release an unquantized GGUF version for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
 Once integrated into a RAG system, Pleias-RAG-1B can also be used in a broader range of non-conversational use cases including user support or educational assistance. Through this release, we aims to make SLMs workable in production by relying systematically on an externalized memory.

 With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
+We also release an [unquantized GGUF version](https://huggingface.co/PleIAs/Pleias-RAG-1B-gguf) for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
 Once integrated into a RAG system, Pleias-RAG-1B can also be used in a broader range of non-conversational use cases including user support or educational assistance. Through this release, we aims to make SLMs workable in production by relying systematically on an externalized memory.