Update README.md
Browse files
README.md
CHANGED
@@ -87,6 +87,6 @@ The easiest way to deploy Pleias-RAG-1B is through [our official library](https:
|
|
87 |
|
88 |
With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
|
89 |
|
90 |
-
We also release an unquantized GGUF version for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
|
91 |
|
92 |
Once integrated into a RAG system, Pleias-RAG-1B can also be used in a broader range of non-conversational use cases including user support or educational assistance. Through this release, we aims to make SLMs workable in production by relying systematically on an externalized memory.
|
|
|
87 |
|
88 |
With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
|
89 |
|
90 |
+
We also release an [unquantized GGUF version](https://huggingface.co/PleIAs/Pleias-RAG-1B-gguf) for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
|
91 |
|
92 |
Once integrated into a RAG system, Pleias-RAG-1B can also be used in a broader range of non-conversational use cases including user support or educational assistance. Through this release, we aims to make SLMs workable in production by relying systematically on an externalized memory.
|