Pclanglais commited on
Commit
e67c025
·
verified ·
1 Parent(s): dcb06a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -85,6 +85,39 @@ All the benchmarks only assess the "trivial" mode on questions requiring some fo
85
  ## Deployment
86
  The easiest way to deploy Pleias-RAG-1B is through [our official library](https://github.com/Pleias/Pleias-RAG-Library). It features an API-like workflow with standardized export of the structured reasoning/answer output into json format. A [Colab Notebook](https://colab.research.google.com/drive/1oG0qq0I1fSEV35ezSah-a335bZqmo4_7?usp=sharing) is available for easy tests and experimentations.
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
89
 
90
  We also release an [unquantized GGUF version](https://huggingface.co/PleIAs/Pleias-RAG-1B-gguf) for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.
 
85
  ## Deployment
86
  The easiest way to deploy Pleias-RAG-1B is through [our official library](https://github.com/Pleias/Pleias-RAG-Library). It features an API-like workflow with standardized export of the structured reasoning/answer output into json format. A [Colab Notebook](https://colab.research.google.com/drive/1oG0qq0I1fSEV35ezSah-a335bZqmo4_7?usp=sharing) is available for easy tests and experimentations.
87
 
88
+ A typical minimal example:
89
+
90
+ ```python
91
+ rag = RAGWithCitations("PleIAs/Pleias-RAG-1B")
92
+
93
+ # Define query and sources
94
+ query = "What is the capital of France?"
95
+ sources = [
96
+ {
97
+ "text": "Paris is the capital and most populous city of France. With an estimated population of 2,140,526 residents as of January 2019, Paris is the center of the Île-de-France dijon metropolitan area and the hub of French economic, political, and cultural life. The city's landmarks, including the Eiffel Tower, Arc de Triomphe, and Cathedral of Notre-Dame, make it one of the world's most visited tourist destinations.",
98
+ "metadata": {"source": "Geographic Encyclopedia", "reliability": "high"}
99
+ },
100
+ {
101
+ "text": "The Eiffel Tower is located in Paris, France. It was constructed from 1887 to 1889 as the entrance to the 1889 World's Fair and was initially criticized by some of France's leading artists and intellectuals for its design. Standing at 324 meters (1,063 ft) tall, it was the tallest man-made structure in the world until the completion of the Chrysler Building in New York City in 1930. The tower receives about 7 million visitors annually and has become an iconic symbol of Paris and France.",
102
+ "metadata": {"source": "Travel Guide", "year": 2020}
103
+ }
104
+ ]
105
+
106
+ # Generate a response
107
+ response = rag.generate(query, sources)
108
+
109
+ # Print the final answer with citations
110
+ print(response["processed"]["clean_answer"])
111
+ ```
112
+
113
+ With expected output:
114
+ ```
115
+ The capital of France is Paris. This is confirmed by multiple sources, with <|source_id|>1 explicitly stating that "Paris is the capital and most populous city of France"[1].
116
+
117
+ **Citations**
118
+ [1] "Paris is the capital and most populous city of France" [Source 1]
119
+ ```
120
+
121
  With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.
122
 
123
  We also release an [unquantized GGUF version](https://huggingface.co/PleIAs/Pleias-RAG-1B-gguf) for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, quality of text generation should be identical to the original model.