--- license: apache-2.0 base_model: ibm-granite/granite-3.2-8b-instruct library_name: peft --- # LoRA Adapter for Hallucination Detection in RAG outputs Welcome to Granite Experiments! Think of Experiments as a preview of what's to come. These projects are still under development, but we wanted to let the open-source community take them for spin! Use them, break them, and help us build what's next for Granite – we'll keep an eye out for feedback and questions. Happy exploring! Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing support or guarantee performance. # Model Summary This is a RAG-specific LoRA adapter for [ibm-granite/granite-3.2-8b-instruct](https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) that is fine-tuned for the hallucination detection task of model outputs. Given a multi-turn conversation between a user and an AI assistant ending with an assistant response and a set of documents/passages on which the last assistant response is supposed to be based, the adapter outputs a faithulness score range (halluciation risk range) for each sentence in the assistant response.
- **Developer:** IBM Research - **Model type:** LoRA adapter for [ibm-granite/granite-3.2-8b-instruct](https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) ## Intended use This is a LoRA adapter that gives the ability to identify hallucination risks for the sentences in the last assistant response in a multi-turn RAG conversation based on a set of provided documents/passages. > [!TIP] > Note: While you can invoke the LoRA adapter directly, as outlined below, we highly recommend calling it through granite-io, which wraps it with a tailored I/O processor. The I/O processor provides a friendlier interface, as it takes care of various data transformations and validation tasks. This includes among others, splitting the assistant response into sentences before calling the adapter, as well as validating the adapters output and transforming the sentence IDs returned by the adapter into appropriate spans over the the response. However, if you prefer to invoke the LoRA adapter directly, the expected input/output is described below. **Model input**: The input to the model is a list of conversational turns ending with an assistant response and a list documents converted to a string using `apply_chat_template` function. For the adapter to work, the last assistant response should be pre-split into sentences and sentence indices needs be preprended. In more detail, the primary inputs are the following three items, each represented in JSON: - **conversation**: A list of conversational turns between the user and the assistant, where each item in the list is a dictionary with fields `role` and `content`. The `role` equals to either `user` or `assistant`, denoting user and assistant turns, respectively, while the `content` field contains the corresponding user/assistant utterance. The conversation should end with an assistant turn and the `text` field of that turn should contain the assistant utterance with each sentence prefixed with a response id of the form ``, where `I` is an integer. The numbering should start from 0 (for the first sentence) and be incremented by one for each subsequent sentence in the last assistant turn. - **documents**: A list of documents, where each item in the list is a dictionary with fields `doc_id` and `text`. The `text` field contains the text of the corresponding document. Additionally this LoRA adapter is trained with a task instruction, which is encoded as a dictionary with fields `role` and `content`, where `role` equals to `system` and `content` equals to the following string describing the hallucination detection task: `Split the last assistant response into individual sentences. For each sentence in the last assistant response, identify the faithfulness score range. Ensure that your output includes all response sentence IDs, and for each response sentence ID, provide the corresponding faithfulness score range. The output must be a json structure.` To prompt the LoRA adapter, we combine the above components as follows: We first append the **instruction** to the end of the **conversation** to generate an **input_conversation** list. Then we invoke the `apply_chat_template` function with parameters: conversation = **augmented_conversation** and documents = **documents**. **Model output**: When prompted with the above input, the model generates a range for faithfulness score (hallucination risk) for each sentence of the last assistant response in the form of a JSON dictionary. The dictionary is of the form `{"": "value_0", "": "value_1", ...}`, where each field ``, where `I` an integer, corresponds to the ID of a sentence in the last assistant response and its corresponding value is the range for faithfulness score (hallucination risk) of the sentence. The output values can show numeric ranges between `0-1` with increments of `0.1`, where the higher values correponds to high faithfulness (low hallucination risk), and lower values corresponds to low faithfulness (high hallucination risk). Additionally, the model is trained to output `unanswerable` when the response sentence indicate that the question is not answerable, and to output `NA` when the faithfulness cannot be determined (ex: very short sentences). ## Quickstart Example As explained above, it is highly recommended to use the LoRA adapter through granite-io [ADD LINK]. However, if you prefer to invoke the LoRA adapter directly, you can use the following code. ``` import torch from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel, PeftConfig from nltk import tokenize import json BASE_NAME = "ibm-granite/granite-3.2-8b-instruct" LORA_NAME = "ibm-granite/granite-3.2-8b-lora-rag-hallucination-detection" device=torch.device('cuda' if torch.cuda.is_available() else 'cpu') tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True) model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map="auto") model_hallucination = PeftModel.from_pretrained(model_base, LORA_NAME) hallucination_sys_prompt = "Split the last assistant response into individual sentences. For each sentence in the last assistant response, identify the faithfulness score range. Ensure that your output includes all response sentence IDs, and for each response sentence ID, provide the corresponding faithfulness score range. The output must be a json structure." def format_conversation(conversation): response_sents = tokenize.sent_tokenize(conversation[-1]["content"]) response_sents_with_ids = [] for ind, sent in enumerate(response_sents): response_sents_with_ids.append(f" {sent}") conversation[-1]["content"] = ' '.join(response_sents_with_ids) conversation.append({ "role": "system", "content": hallucination_sys_prompt }) return conversation conversation = [ { "role": "user", "content": "What happened to Dennis Wilson of the Beach Boys in 1983?" }, { "role": "assistant", "content": "Dennis Wilson of the Beach Boys drowned in Marina del Rey on December 28, 1983, while diving from a friend's boat trying to recover items that he had previously thrown overboard in fits of rage. Forensic pathologists believed that Dennis experienced shallow-water blackout just before his death" } ] input_conversation = format_conversation(conversation=conversation) documents = [ { "doc_id": 1, "text": "The Beach Boys are an American rock band formed in Hawthorne, California, in 1961. The group's original lineup consisted of brothers Brian, Dennis, and Carl Wilson; their cousin Mike Love; and their friend Al Jardine. Distinguished by their vocal harmonies and early surf songs, they are one of the most influential acts of the rock era. The band drew on the music of jazz-based vocal groups, 1950s rock and roll, and black R&B to create their unique sound, and with Brian as composer, arranger, producer, and de facto leader, often incorporated classical or jazz elements and unconventional recording techniques in innovative ways. In 1983, tensions between Dennis and Love escalated so high that each obtained a restraining order against each other. With the rest of the band fearing that he would end up like Brian, Dennis was given an ultimatum after his last performance in November 1983 to check into rehab for his alcohol problems or be banned from performing live with them. Dennis checked into rehab for his chance to get sober, but on December 28, 1983, he fatally drowned in Marina del Rey while diving from a friend's boat trying to recover items that he had previously thrown overboard in fits of rage." }, { "doc_id": 2, "text": "A cigarette smoker since the age of 13, Carl was diagnosed with lung cancer after becoming ill at his vacation home in Hawaii, in early 1997. Despite his illness, Carl continued to perform while undergoing chemotherapy. He played and sang throughout the Beach Boys' entire summer tour which ended in the fall of 1997. During the performances, he sat on a stool, but he stood while singing \"God Only Knows\". Carl died of lung cancer in Los Angeles, surrounded by his family, on February 6, 1998, just two months after the death of his mother, Audree Wilson. He was interred at Westwood Village Memorial Park Cemetery in Los Angeles." }, { "doc_id": 3, "text": "Carl Dean Wilson (December 21, 1946 - February 6, 1998) was an American musician, singer, and songwriter who co-founded the Beach Boys. He is best remembered as their lead guitarist, as the youngest brother of bandmates Brian and Dennis Wilson, and as the group's de facto leader in the early 1970s. He was also the band's musical director on stage from 1965 until his death. Influenced by the guitar playing of Chuck Berry and the Ventures, Carl's initial role in the group was that of lead guitarist and backing vocals, but he performed lead vocals on several of their later hits, including \"God Only Knows\" (1966), \"Good Vibrations\" (1966), and \"Kokomo\" (1988). By the early 1980s the Beach Boys were in disarray; the band had split into several camps. Frustrated with the band's sluggishness to record new material and reluctance to rehearse, Wilson took a leave of absence in 1981. He quickly recorded and released a solo album, Carl Wilson, composed largely of rock n' roll songs co-written with Myrna Smith-Schilling, a former backing vocalist for Elvis Presley and Aretha Franklin, and wife of Wilson's then-manager Jerry Schilling. The album briefly charted, and its second single, \"Heaven\", reached the top 20 on Billboard's Adult Contemporary chart." } ] # Generate answer input_text = tokenizer.apply_chat_template(conversation=input_conversation, documents=documents, tokenize=False) inputs = tokenizer(input_text, return_tensors="pt") output = model_hallucination.generate(inputs["input_ids"].to(device), attention_mask=inputs["attention_mask"].to(device), max_new_tokens=500) output_text = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print("Output: " + json.loads(output_text)) ``` ## Training Details The process of generating the training data consisted of two main steps: - **Multi-turn RAG conversation generation:** Starting from publicly available document corpora, we generated a set of multi-turn RAG data, consisting of multi-turn conversations grounded on passages retrieved from the corpus. For details on the RAG conversation generation process please refer to the [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf) and [Lee, Young-Suk, et al.](https://arxiv.org/pdf/2409.11500). - **Faithfulness label generation:** For creating the faithfulness labels for responses, we used the NLI based technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009). This process resulted in ~130K data instances, which were used to train the LoRA adapter. ### Training Data The following public datasets were used as seed datasets for the multi-turn RAG conversation generation process: - [CoQA](https://stanfordnlp.github.io/coqa/) - Wikipedia passages - [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial) - [QuAC](https://huggingface.co/datasets/allenai/quac) #### Training Hyperparameters The LoRA adapter was fine-tuned using PEFT under the following regime: rank = 8, learning rate = 1e-5, and 90/10 split between training and validation. ## Evaluation We evaluate the LoRA adapter on the QA portion of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) benchmark. We compare the response-level hallucination detection performance between the LoRA adapter and the methods reported in the RAGTruth paper. The responses that obtain a faithfulness score less than `0.1` for at least one sentence are considered as hallucinated responses. The results are shown in the table below. The results for the baselines are extracted from the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) paper. Model | Precision | Recall | F1 -- | -- | -- | -- gpt-3.5-turbo (prompted) | 18.8 | 84.4 | 30.8 gpt-4-turbo (prompted) | 33.2 | 90.6 | 45.6 [SelfCheckGPT](https://aclanthology.org/2023.emnlp-main.557.pdf) | 35 | 58 | 43.7 [LMvLM](https://aclanthology.org/2023.emnlp-main.778.pdf) | 18.7 | 76.9 | 30.1 Finetuned Llama-2-13B | 61.6 | 76.3 | 68.2 hallucination-detection LoRA | 67.6 | 77.4 | 72.2 ## Model Card Authors [Chulaka Gunasekara](mailto:chulaka.gunasekara@ibm.com)