Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing
|
|
13 |
|
14 |
# Model Summary
|
15 |
|
16 |
-
This is a RAG-specific LoRA adapter for [ibm-granite/granite-3.2-8b-instruct](https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) that is fine-tuned for the hallucination detection task of model outputs. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the adapter outputs a halluciation risk range for each sentence in the assistant response.
|
17 |
|
18 |
</br>
|
19 |
|
@@ -25,17 +25,18 @@ This is a RAG-specific LoRA adapter for [ibm-granite/granite-3.2-8b-instruct](ht
|
|
25 |
This is a LoRA adapter that gives the ability to identify hallucination risks for the sentences in the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages.
|
26 |
|
27 |
> [!TIP]
|
28 |
-
> Note: While you can invoke the LoRA adapter directly, as outlined below, we highly recommend calling it through granite-io
|
29 |
|
30 |
However, if you prefer to invoke the LoRA adapter directly, the expected input/output is described below.
|
31 |
|
32 |
-
**Model input**: The input to the model is
|
33 |
|
34 |
- **conversation**: A list of conversational turns between the user and the assistant, where each item in the list is a dictionary with fields `role` and `content`. The `role` equals to either `user` or `assistant`, denoting user and assistant turns, respectively, while the `content` field contains the corresponding user/assistant utterance. The conversation should end with an assistant turn and the `text` field of that turn should contain the assistant utterance with each sentence prefixed with a response id of the form `<rI>`, where `I` is an integer. The numbering should start from 0 (for the first sentence) and be incremented by one for each subsequent sentence in the last assistant turn.
|
35 |
-
- **
|
36 |
-
- **documents**: A list of documents, where each item in the list is a dictionary with fields `doc_id` and `text`. The `text` field contains the text of the corresponding document.
|
37 |
|
38 |
-
|
|
|
|
|
39 |
|
40 |
**Model output**: When prompted with the above input, the model generates a range for faithfulness score (hallucination risk) for each sentence of the last assistant response in the form of a JSON dictionary. The dictionary is of the form `{"<r0>": "value_0", "<r1>": "value_1", ...}`, where each field `<rI>`, where `I` an integer, corresponds to the ID of a sentence in the last assistant response and its corresponding value is the range for faithfulness score (hallucination risk) of the sentence. The output values can show numeric ranges between `0-1` with increments of `0.1`, where the higher values correponds to high faithfulness (low hallucination risk), and lower values corresponds to low faithfulness (high hallucination risk). Additionally, the model is trained to output `unanswerable` when the response sentence indicate that the question is not answerable, and to output `NA` when the faithfulness cannot be determined (ex: very short sentences).
|
41 |
|
@@ -54,7 +55,7 @@ from nltk import tokenize
|
|
54 |
import json
|
55 |
|
56 |
BASE_NAME = "ibm-granite/granite-3.2-8b-instruct"
|
57 |
-
LORA_NAME = "ibm-granite/granite-
|
58 |
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
59 |
|
60 |
tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True)
|
|
|
13 |
|
14 |
# Model Summary
|
15 |
|
16 |
+
This is a RAG-specific LoRA adapter for [ibm-granite/granite-3.2-8b-instruct](https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) that is fine-tuned for the hallucination detection task of model outputs. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the adapter outputs a faithulness score range (halluciation risk range) for each sentence in the assistant response.
|
17 |
|
18 |
</br>
|
19 |
|
|
|
25 |
This is a LoRA adapter that gives the ability to identify hallucination risks for the sentences in the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages.
|
26 |
|
27 |
> [!TIP]
|
28 |
+
> Note: While you can invoke the LoRA adapter directly, as outlined below, we highly recommend calling it through granite-io, which wraps it with a tailored I/O processor. The I/O processor provides a friendlier interface, as it takes care of various data transformations and validation tasks. This includes among others, splitting the input documents and assistant response into sentences before calling the adapter, as well as validating the adapters output and transforming the sentence IDs returned by the adapter into appropriate spans over the documents and the response.
|
29 |
|
30 |
However, if you prefer to invoke the LoRA adapter directly, the expected input/output is described below.
|
31 |
|
32 |
+
**Model input**: The input to the model is a list of conversational turns ending with an assistant response and a list documents converted to a string using `apply_chat_template` function. For the adapter to work, the last assistant response should be pre-split into sentences and sentence indices needs be preprended. In more detail, the primary inputs are the following three items, each represented in JSON:
|
33 |
|
34 |
- **conversation**: A list of conversational turns between the user and the assistant, where each item in the list is a dictionary with fields `role` and `content`. The `role` equals to either `user` or `assistant`, denoting user and assistant turns, respectively, while the `content` field contains the corresponding user/assistant utterance. The conversation should end with an assistant turn and the `text` field of that turn should contain the assistant utterance with each sentence prefixed with a response id of the form `<rI>`, where `I` is an integer. The numbering should start from 0 (for the first sentence) and be incremented by one for each subsequent sentence in the last assistant turn.
|
35 |
+
- **documents**: A list of documents, where each item in the list is a dictionary with fields `doc_id` and `text`. The `text` field contains the text of the corresponding document.
|
|
|
36 |
|
37 |
+
Additionally this LoRA adapter is trained with a task instruction, which is encoded as a dictionary with fields `role` and `content`, where `role` equals to `system` and `content` equals to the following string describing the citation generation task: `Split the last assistant response into individual sentences. For each sentence in the last assistant response, identify the faithfulness score range. Ensure that your output includes all response sentence IDs, and for each response sentence ID, provide the corresponding faithfulness score range. The output must be a json structure.`
|
38 |
+
|
39 |
+
To prompt the LoRA adapter, we combine the above components as follows: We first append the **instruction** to the end of the **conversation** to generate an **input_conversation** list. Then we invoke the `apply_chat_template` function with parameters: conversation = **augmented_conversation** and documents = **documents**.
|
40 |
|
41 |
**Model output**: When prompted with the above input, the model generates a range for faithfulness score (hallucination risk) for each sentence of the last assistant response in the form of a JSON dictionary. The dictionary is of the form `{"<r0>": "value_0", "<r1>": "value_1", ...}`, where each field `<rI>`, where `I` an integer, corresponds to the ID of a sentence in the last assistant response and its corresponding value is the range for faithfulness score (hallucination risk) of the sentence. The output values can show numeric ranges between `0-1` with increments of `0.1`, where the higher values correponds to high faithfulness (low hallucination risk), and lower values corresponds to low faithfulness (high hallucination risk). Additionally, the model is trained to output `unanswerable` when the response sentence indicate that the question is not answerable, and to output `NA` when the faithfulness cannot be determined (ex: very short sentences).
|
42 |
|
|
|
55 |
import json
|
56 |
|
57 |
BASE_NAME = "ibm-granite/granite-3.2-8b-instruct"
|
58 |
+
LORA_NAME = "ibm-granite/granite-3.2-8b-lora-rag-hallucination-detection"
|
59 |
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
60 |
|
61 |
tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True)
|