Update README.md
Browse files
README.md
CHANGED
@@ -83,6 +83,8 @@ Pleias-RAG-1B has been evaluated on three standard RAG benchmarks, 2wiki, Hotpot
|
|
83 |
All the benchmarks only assess the "trivial" mode on questions requiring some form of multi-hop reasoning over sources (answer disseminated into different sources) as well as discrimination of distractor sources.
|
84 |
|
85 |
## Deployment
|
|
|
|
|
86 |
With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
|
87 |
|
88 |
We also release an unquantized GGUF version for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
|
|
|
83 |
All the benchmarks only assess the "trivial" mode on questions requiring some form of multi-hop reasoning over sources (answer disseminated into different sources) as well as discrimination of distractor sources.
|
84 |
|
85 |
## Deployment
|
86 |
+
The easiest way to deploy Pleias-RAG-1B is through [our official library](https://github.com/Pleias/Pleias-RAG-Library). It features an API-like workflow with standardized export of the structured reasoning/answer output into json format. A [Colab Notebook](https://colab.research.google.com/drive/1oG0qq0I1fSEV35ezSah-a335bZqmo4_7?usp=sharing) is available for easy tests and experimentations.
|
87 |
+
|
88 |
With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
|
89 |
|
90 |
We also release an unquantized GGUF version for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
|