Pclanglais commited on
Commit
dcb06a7
·
verified ·
1 Parent(s): 2afe4c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -87,6 +87,6 @@ The easiest way to deploy Pleias-RAG-1B is through [our official library](https:
87
 
88
  With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
89
 
90
- We also release an unquantized GGUF version for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
91
 
92
  Once integrated into a RAG system, Pleias-RAG-1B can also be used in a broader range of non-conversational use cases including user support or educational assistance. Through this release, we aims to make SLMs workable in production by relying systematically on an externalized memory.
 
87
 
88
  With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
89
 
90
+ We also release an [unquantized GGUF version](https://huggingface.co/PleIAs/Pleias-RAG-1B-gguf) for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
91
 
92
  Once integrated into a RAG system, Pleias-RAG-1B can also be used in a broader range of non-conversational use cases including user support or educational assistance. Through this release, we aims to make SLMs workable in production by relying systematically on an externalized memory.