Link model card to granite-io (#1)
Browse files- Link model card to granite-io (abf16b0fc57ae618a20fa4f0b90e9e3ee4eb65ca)
Co-authored-by: Yannis Katsis <[email protected]>
README.md
CHANGED
@@ -29,6 +29,9 @@ This is a RAG-specific LoRA adapter for [ibm-granite/granite-3.2-8b-instruct](ht
|
|
29 |
## Intended use
|
30 |
This is a LoRA adapter that gives the ability to generate citations for the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages. It can be used to generate post-hoc citations for assistant responses generated by any LLM in a RAG setting.
|
31 |
|
|
|
|
|
|
|
32 |
**Model input**: The input to the model is conceptually a list of conversational turns ending with an assistant response and a list of documents converted to a string using the `apply_chat_template` function. For the adapter to work, the last assistant response as well as the documents should be pre-split into sentences. In more detail, the primary inputs are the following three items, each represented in JSON:
|
33 |
|
34 |
- **conversation**: A list of conversational turns between the user and the assistant, where each item in the list is a dictionary with fields `role` and `content`. The `role` equals to either `user` or `assistant`, denoting user and assistant turns, respectively, while the `content` field contains the corresponding user/assistant utterance. The conversation should end with an assistant turn and the `text` field of that turn should contain the assistant utterance with each sentence prefixed with a response sentence ID of the form `<rI>`, where `I` is an integer. The numbering should start from 0 (for the first sentence) and be incremented by one for each subsequent sentence in the last assistant turn. Note that only the last assistant turn should be split into sentences as described above; earlier assistant turns (as well as all user turns) should be maintained in their original form.
|
@@ -42,7 +45,9 @@ To prompt the LoRA adapter, we combine the above components as follows: We first
|
|
42 |
|
43 |
## Quickstart Example
|
44 |
|
45 |
-
|
|
|
|
|
46 |
|
47 |
```
|
48 |
import torch
|
|
|
29 |
## Intended use
|
30 |
This is a LoRA adapter that gives the ability to generate citations for the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages. It can be used to generate post-hoc citations for assistant responses generated by any LLM in a RAG setting.
|
31 |
|
32 |
+
> [!TIP]
|
33 |
+
> Note: While you can invoke the adapter directly, as outlined below, we highly recommend calling it through [granite-io](https://github.com/ibm-granite/granite-io), which wraps the model with a tailored I/O processor, enabling a friendlier development interface. The I/O processor takes care of several data transformation/validation tasks that would be otherwise required (incl. splitting the input documents and assistant response into sentences before calling the adapter as well as validating the adapter's output and transforming the returned sentence IDs into spans over the documents and the response).
|
34 |
+
|
35 |
**Model input**: The input to the model is conceptually a list of conversational turns ending with an assistant response and a list of documents converted to a string using the `apply_chat_template` function. For the adapter to work, the last assistant response as well as the documents should be pre-split into sentences. In more detail, the primary inputs are the following three items, each represented in JSON:
|
36 |
|
37 |
- **conversation**: A list of conversational turns between the user and the assistant, where each item in the list is a dictionary with fields `role` and `content`. The `role` equals to either `user` or `assistant`, denoting user and assistant turns, respectively, while the `content` field contains the corresponding user/assistant utterance. The conversation should end with an assistant turn and the `text` field of that turn should contain the assistant utterance with each sentence prefixed with a response sentence ID of the form `<rI>`, where `I` is an integer. The numbering should start from 0 (for the first sentence) and be incremented by one for each subsequent sentence in the last assistant turn. Note that only the last assistant turn should be split into sentences as described above; earlier assistant turns (as well as all user turns) should be maintained in their original form.
|
|
|
45 |
|
46 |
## Quickstart Example
|
47 |
|
48 |
+
As explained above, it is highly recommended to use the LoRA adapter through [granite-io](https://github.com/ibm-granite/granite-io). To get started, refer to the [example notebook](https://github.com/ibm-granite/granite-io/blob/main/notebooks/citations.ipynb) in the granite-io repository.
|
49 |
+
|
50 |
+
However, if you prefer to invoke the LoRA adapter directly, you can use the following code. Note that the code assumes that the documents and the last assistant response have been already split into sentences.
|
51 |
|
52 |
```
|
53 |
import torch
|