File size: 26,889 Bytes
c05f354 cb47af3 c05f354 a98310a c05f354 57e165e c05f354 94452d6 c05f354 db225b0 c05f354 db225b0 c05f354 f582866 c05f354 db225b0 f582866 6f90e1d a98310a 29835a0 a98310a 29835a0 c05f354 a98310a c05f354 db225b0 c05f354 a98310a c05f354 a98310a c05f354 a98310a c05f354 a98310a c05f354 a98310a c05f354 a98310a f582866 a98310a c05f354 db225b0 c05f354 f582866 db225b0 c05f354 db225b0 c05f354 f582866 c05f354 db225b0 c05f354 f582866 c05f354 db225b0 c05f354 db225b0 c05f354 e016513 b7f3e2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 |
---
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
---
# Granite 3.0 8B Instruct - Intrinsics LoRA v0.1
Welcome to Granite Experiments!
Think of Experiments as a preview of what's to come. These projects are still under development, but we wanted to let the open-source community take them for spin! Use them, break them, and help us build what's next for Granite – we'll
keep an eye out for feedback and questions in the [Community section](https://huggingface.co/ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1/discussions). Happy exploring!
## Model Summary
**Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** is a merged LoRA finetune for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct),
providing access to the Uncertainty, Hallucination Detection, and Safety Exception intrinsics in addition to retaining the full abilities of the [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) model.
- **Developer:** IBM Research
- **Model type:** LoRA adapter for [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/ornGz5BdtfIXLYxDzUgi9.png)
### Uncertainty Intrinsic
The Uncertainty intrinsic is designed to provide a Certainty score for model responses to user questions.
**Certainty score definition** The model will respond with a number from 0 to 9, corresponding to 5%, 15%, 25%,...95% confidence respectively.
This percentage is *calibrated* in the following sense: given a set of answers assigned a certainty score of X%, approximately X% of these answers should be correct. See the eval experiment below for out-of-distribution verification of this behavior.
### Hallucination Detection (RAG) Intrinsic
The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
### Safety Exception Intrinsic
The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
The Safety Exception intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
## Usage
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Intended use
This is an experimental LoRA testing new functionality being developed for IBM's Granite LLM family. We are welcoming the community to test it out and give us feedback, but we are NOT recommending this model be used for real deployments at this time. Stay tuned for more updates on the Granite roadmap.
**Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** is lightly tuned so that its behavior closely mimics that of [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct),
with the added ability to generate the three specified intrinsics.
### Invoking intrinsics
Each intrinsic is associated with its own generation role and has its own usage steps. Note that each intrinsic responds with only one token, and any additional text after this token should be ignored. You can curb additional generation by setting "max token length" = 1 when using any intrinsic.
**Uncertainty Intrinsic Usage Steps** Answering a question and obtaining a certainty score proceeds as follows.
1. Prompt the model with a system prompt (required) followed by the user prompt.
2. Use the model to generate a response as normal (via the `assistant` role).
3. Invoke the Uncertainty intrinsic by generating in the `certainty` role (use "certainty" as the role in the chat template, or simply append `<|start_of_role|>certainty<|end_of_role|>` and continue generating), see examples below.
4. The model will respond with an integer certainty score from 0 to 9.
The model was calibrated with the following system prompt: `You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.`
You can further augment this system prompts for a given use case or task, but it is recommended your system prompt always starts with this string.
**Hallucination Detection Intrinsic Usage Steps** Answering a question and detecting hallucination proceeds as follows.
1. Prompt the model with the system prompt (required) followed by the user prompt.
2. Use the model to generate a response as normal (via the `assistant` role).
3. Invoke the Hallucination Detection intrinsic by generating in the `hallucination` role (use "hallucination" as the role in the chat template, or simply append `<|start_of_role|>hallucination<|end_of_role|>` and continue generating), see examples below.
4. The model will respond with `Y` or `N`.
**Safety Exception Intrinsic Usage Steps** Determining if a user query is safe proceeds as follows.
1. Prompt the model with the system prompt (required) followed by the user prompt.
2. Invoke the Safety Exception intrinsic by generating in the `safety` role (use "safety" as the role in the chat template, or simply append `<|start_of_role|>safety<|end_of_role|>` and continue generating), see examples below.
3. The model will respond with `Y` (unsafe) or `N` (safe).
## Combining Intrinsics
In many pipelines, it may be desirable to invoke multiple intrinsics at different points. In a multi-turn conversation possibly involving other intrinsics, it is important to use
attention masking to provide only the relevant information to the intrinsic of interest. We explore two frameworks for accomplishing this - [Prompt Declaration Language](https://github.com/IBM/prompt-declaration-language) (PDL) and SGLang.
In the examples below, we explore the following RAG flow. First, a user query is provided with
relevant documents provided by a RAG system. We can invoke the Safety Exception intrinsic to determine if the query is safe. If it is safe, we can proceed to generate an answer to the question as normal. Finally,
we can evaluate the certainty and hallucination status of this reply by invoking the Uncertainty and Hallucination Detection intrinsics.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
### Intrinsics Example with PDL
Given a hosted instance of **Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** at `API_BASE` (insert the host address here), this uses the [PDL language](https://github.com/IBM/prompt-declaration-language) to implement the RAG intrinsic invocation scenario described above.
Note that the hosted instance must be supported by LiteLLM ([https://docs.litellm.ai/docs/providers](https://docs.litellm.ai/docs/providers))
First, create a file `intrinsics.pdl` with the following content.
```
defs:
system_prompt: "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior."
rag_prompt: "Provide a short response to the user's question based on the information present in the documents. If the documents lack the necessary details, inform the user that the question cannot be answered."
document:
|
Disability housing grants for Veterans
We offer housing grants for Veterans and service members with certain service - connected disabilities so they can buy or change a home to meet their needs and live more independently. Changing a home might involve installing ramps or widening doorways. Find out if you re eligible for a disability housing grant and how to apply.
Can I get a Specially Adapted Housing (SAH) grant and how much funding does this grant offer?
You may be able to get an SAH grant if you re using the grant money to buy, build, or change your permanent home a home you plan to live in for a long time and you meet both of the requirements listed below. Both of these must be true. You: Own or will own the home , and Have a qualifying service - connected disability Qualifying service - connected disabilities include : The loss or loss of use of more than one limb The loss or loss of use of a lower leg along with the residuals lasting effects of an organic natural disease or injury Blindness in both eyes having only light perception along with the loss or loss of use of a leg Certain severe burns The loss or loss of use of one or both lower extremities feet or legs after September 11 , 2001, that makes it so you can t balance or walk without the help of braces, crutches, canes, or a wheelchair Note : Only 30 Veterans and service members each fiscal year FY can qualify for a grant based on the loss of extremities after September 11 , 2001. If you qualify for but don t receive a grant in 2019 because the cap was reached , you may be able to use this benefit in FY 2020 or future years if the law continues to give us the authority to offer these grants and we don t go beyond the new FY cap. For FY 2019 , you may be able to get up to 3 grants for a total of up to $85,645 through the SAH grant program. Learn more about how to apply for a housing grant
Can I get a Special Housing Adaptation (SHA) grant and how much funding does this grant offer?
You may be able to get an SHA grant if you re using the grant money to buy, build, or change your permanent home a home you plan to live in for a long time and you meet both of the requirements listed below. Both of these must be true : You or a family member own or will own the home , and You have a qualifying service - connected disability Qualifying service - connected disabilities include : Blindness in both eyes with 20/200 visual acuity or less The loss or loss of use of both hands Certain severe burns Certain respiratory or breathing injuries For FY 2019 , you may be able to get up to 3 grants for a total of up to $17,130 through the SHA grant program. Learn more about how to apply for a housing grant \n\nWhat if I need money to make changes to a family member s home that I m living in for a short period of time? \nYou may be able to get a Temporary Residence Adaptation TRA grant if you meet both of the requirements listed below. Both of these must be true. You: Qualify for an SAH or SHA grant see above , and Are living temporarily in a family member s home that needs changes to meet your needs If you qualify for an SAH grant , you can get up to $37,597 through the TRA grant program for FY 2019. If you qualify for an SHA grant , you can get up to $6,713 through the TRA grant program for FY 2019.
Apply for an SAH, SHA, or TRA grant
You can apply online right now by going to our eBenefits website. You ll need to sign in to eBenefits with your DS Logon basic or premium account. If you don t have a DS Logon account , you can register for one on the site. Go to eBenefits to apply.
query: How much funding does the SAH grant offer?
text:
- include: intrinsics-defs.pdl
- defs:
unsafe:
call: get_safety
args:
query: ${ query }
- role: system
text: ${ system_prompt }
contribute: [context]
- if: ${ unsafe == "N" }
then:
text:
- "\n\nDocuments: ${ document }\n\n ${ query }\n"
- model: openai/granite-8b-intrinsics-v2-20241203
def: answer
parameters: {api_key: EMPTY, api_base: API_BASE, temperature: 0, stop: "\n"}
- defs: ## Implicit fork of context
certainty:
call: get_certainty
hallucination:
call: get_hallucination
- "\nCertainty: ${ certainty }"
- "\nHallucination: ${ hallucination }"
```
Next, create a file `intrinsics-defs.pdl` with the following content.
```
defs:
apply_template:
function:
context: [{role: str, content: str}]
return:
text:
for:
c: ${ context }
repeat:
text:
- <|start_of_role|>${ c.role }<|end_of_role|>
- ${ c.content }
- <|end_of_text|>
join:
with: "\n"
get_intrinsic:
function:
intrinsic: str
return:
lastOf:
- call: apply_template
def: mycontext
args:
context: ${ pdl_context }
- model: granite-intrinsics-3.0-8b-instruct-v0.1
parameters:
api_key: EMPTY
api_base: API_BASE
temperature: 0
max_tokens: 1
custom_llm_provider: text-completion-openai
prompt:
|
${ mycontext }
<|start_of_role|>${ intrinsic }<|end_of_role|>
get_safety:
function:
query: str
return:
lastOf:
- ${ query }
- call: apply_template
def: mycontext
args:
context: ${ pdl_context }
- call: get_intrinsic
args:
intrinsic: safety
get_hallucination:
function:
return:
call: get_intrinsic
args:
intrinsic: hallucination
get_certainty:
function:
return:
call: get_intrinsic
args:
intrinsic: certainty
```
To run the example, in the command line run `pdl intrinsics.pdl` after installing the PDL CLI (`pip install prompt-declaration-language`).
### Intrinsics Example with SGLang
The below SGLang implementation uses the SGLang fork at [https://github.com/frreiss/sglang/tree/granite](https://github.com/frreiss/sglang/tree/granite) that supports Granite models.
```python
import sglang as sgl
from sglang.lang.chat_template import get_chat_template
@sgl.function
def safety_check (s, question):
s += sgl.user(question)
s += "<|start_of_role|>safety<|end_of_role|>" + sgl.gen("safety", temperature=0, max_tokens=1)
# print ("\n====== Safety check state =======\n")
# print (s)
# print ("\n")
# Input data
system_prompt = "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior."
rag_prompt = "Provide a short response to the user's question based on the information present in the documents. If the documents lack the necessary details, inform the user that the question cannot be answered."
document = """Disability housing grants for Veterans
We offer housing grants for Veterans and service members with certain service - connected disabilities so they can buy or change a home to meet their needs and live more independently. Changing a home might involve installing ramps or widening doorways. Find out if you re eligible for a disability housing grant and how to apply.
Can I get a Specially Adapted Housing (SAH) grant and how much funding does this grant offer?
You may be able to get an SAH grant if you re using the grant money to buy, build, or change your permanent home a home you plan to live in for a long time and you meet both of the requirements listed below. Both of these must be true. You: Own or will own the home , and Have a qualifying service - connected disability Qualifying service - connected disabilities include : The loss or loss of use of more than one limb The loss or loss of use of a lower leg along with the residuals lasting effects of an organic natural disease or injury Blindness in both eyes having only light perception along with the loss or loss of use of a leg Certain severe burns The loss or loss of use of one or both lower extremities feet or legs after September 11 , 2001, that makes it so you can t balance or walk without the help of braces, crutches, canes, or a wheelchair Note : Only 30 Veterans and service members each fiscal year FY can qualify for a grant based on the loss of extremities after September 11 , 2001. If you qualify for but don t receive a grant in 2019 because the cap was reached , you may be able to use this benefit in FY 2020 or future years if the law continues to give us the authority to offer these grants and we don t go beyond the new FY cap. For FY 2019 , you may be able to get up to 3 grants for a total of up to $85,645 through the SAH grant program. Learn more about how to apply for a housing grant
Can I get a Special Housing Adaptation (SHA) grant and how much funding does this grant offer?
You may be able to get an SHA grant if you re using the grant money to buy, build, or change your permanent home a home you plan to live in for a long time and you meet both of the requirements listed below. Both of these must be true : You or a family member own or will own the home , and You have a qualifying service - connected disability Qualifying service - connected disabilities include : Blindness in both eyes with 20/200 visual acuity or less The loss or loss of use of both hands Certain severe burns Certain respiratory or breathing injuries For FY 2019 , you may be able to get up to 3 grants for a total of up to $17,130 through the SHA grant program. Learn more about how to apply for a housing grant \n\nWhat if I need money to make changes to a family member s home that I m living in for a short period of time? \nYou may be able to get a Temporary Residence Adaptation TRA grant if you meet both of the requirements listed below. Both of these must be true. You: Qualify for an SAH or SHA grant see above , and Are living temporarily in a family member s home that needs changes to meet your needs If you qualify for an SAH grant , you can get up to $37,597 through the TRA grant program for FY 2019. If you qualify for an SHA grant , you can get up to $6,713 through the TRA grant program for FY 2019.
Apply for an SAH, SHA, or TRA grant
You can apply online right now by going to our eBenefits website. You ll need to sign in to eBenefits with your DS Logon basic or premium account. If you don t have a DS Logon account , you can register for one on the site. Go to eBenefits to apply.
"""
query = "How much funding does the SAH grant offer?"
# The following function processes a chat between a user and an assistant.
# For simplicity, this assumes a fixed document, but in a true RAG setting, the
# documents will be retrieved dynamically based on the user turns.
@sgl.function
def main_chat_flow (s, doc, query):
s += sgl.system (system_prompt)
# Safety check on query
state = safety_check (question=query)
print (f"Safety Output: {state["safety"]} for question: {query}\n")
# RAG answer generation
s += sgl.user(rag_prompt + "\n\nDocuments: " + doc + "\n\n" + query)
s += sgl.assistant (sgl.gen ("answer", stop="\n", temperature=0, max_tokens=200))
answer = s["answer"]
print (f"Assistant: {answer}\n")
# Hallucination check in parallel with uncertainty quantification for the generated answer
forks = s.fork(2)
for i, f in enumerate(forks):
if (i == 0):
f += "<|start_of_role|>hallucination<|end_of_role|>"
f += sgl.gen("hallucination", temperature=0, max_tokens=1)
# print ("\n====== Fork 0 state =======\n")
# print (f)
# print ("\n")
else:
f += "<|start_of_role|>certainty<|end_of_role|>"
f += sgl.gen("certainty", temperature=0, max_tokens=1)
# print ("\n====== Fork 1 state =======\n")
# print (f)
# print ("\n")
print (f"Hallucination Output: {forks [0]["hallucination"]} for answer: {answer}\n")
print (f"Certainty Output: {forks [1]["certainty"]} for answer: {answer}\n")
if __name__ == "__main__":
model_path = "ibm-granite/granite-3.0-8b-lora-intrinsics-v0.1"
# Setting the model_path to the granite model, and chat template to be the granite template
# This assumes "granite3-instruct" chat template has been registered in "sglang/lang/chat_template.py"
runtime = sgl.Runtime(model_path=model_path)
runtime.endpoint.chat_template = get_chat_template("granite3-instruct")
sgl.set_default_backend(runtime)
state = main_chat_flow (doc=document, query=query)
```
#### Notes
**Certainty score interpretation** Certainty scores calibrated as defined above may at times seem biased towards moderate certainty scores for the following reasons. Firstly, as humans we tend to be overconfident in
our evaluation of what we know and don't know - in contrast, a calibrated model is less likely to output very high or very low confidence scores, as these imply certainty of correctness or incorrectness.
Examples where you might see very low confidence scores might be on answers where the model's response was something to the effect of "I don't know", which is easy to evaluate as not
being the correct answer to the question (though it is the appropriate one). Secondly, remember that the model
is evaluating itself - correctness/incorrectness that may be obvious to us or to larger models may be less obvious to an 8b model. Finally, teaching a model every fact it knows
and doesn't know is not possible, hence it must generalize to questions of wildly varying difficulty (some of which may be trick questions!) and to settings where it has not had its outputs judged.
Intuitively, it does this by extrapolating based on related questions
it has been evaluated on in training - this is an inherently inexact process and leads to some hedging.
Certainty is inherently an intrinsic property of a model and its abilitities. The Uncertainty Intrinsic is not intended to predict the certainty of responses generated by any other models besides itself or [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct).
Additionally, certainty scores are *distributional* quantities, and so will do well on realistic questions in aggregate, but in principle may have surprising scores on individual
red-teamed examples.
## Evaluation
We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
We first find that the performance of the intrinsics in our shared model **Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** is not degraded
versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/NsvMpweFjmjIhWFaKtI-K.png)
We then find that RAG performance of **Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** does not suffer with respect to the base model [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct). Here we evaluate the RAGBench benchmark on RAGAS faithfulness and correction metrics.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/hyOlQmXPirlCYeILLBXhc.png)
## Training Details
The **Granite 3.0 8B Instruct - Intrinsics LoRA v0.1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
### UQ Training Data
The following datasets were used for calibration and/or finetuning. Certainty scores were obtained via the method in [[Shen et al. ICML 2024] Thermometer: Towards Universal Calibration for Large Language Models](https://arxiv.org/abs/2403.08819).
* [BigBench](https://huggingface.co/datasets/tasksource/bigbench)
* [MRQA](https://huggingface.co/datasets/mrqa-workshop/mrqa)
* [newsqa](https://huggingface.co/datasets/lucadiliello/newsqa)
* [trivia_qa](https://huggingface.co/datasets/mandarjoshi/trivia_qa)
* [search_qa](https://huggingface.co/datasets/lucadiliello/searchqa)
* [openbookqa](https://huggingface.co/datasets/allenai/openbookqa)
* [web_questions](https://huggingface.co/datasets/Stanford/web_questions)
* [smiles-qa](https://huggingface.co/datasets/alxfgh/ChEMBL_Drug_Instruction_Tuning)
* [orca-math](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k)
* [ARC-Easy](https://huggingface.co/datasets/allenai/ai2_arc)
* [commonsense_qa](https://huggingface.co/datasets/tau/commonsense_qa)
* [social_i_qa](https://huggingface.co/datasets/allenai/social_i_qa)
* [super_glue](https://huggingface.co/datasets/aps/super_glue)
* [figqa](https://huggingface.co/datasets/nightingal3/fig-qa)
* [riddle_sense](https://huggingface.co/datasets/INK-USC/riddle_sense)
* [ag_news](https://huggingface.co/datasets/fancyzhx/ag_news)
* [medmcqa](https://huggingface.co/datasets/openlifescienceai/medmcqa)
* [dream](https://huggingface.co/datasets/dataset-org/dream)
* [codah](https://huggingface.co/datasets/jaredfern/codah)
* [piqa](https://huggingface.co/datasets/ybisk/piqa)
### RAG Hallucination Training Data
The following public datasets were used for finetuning. The details of data creation for RAG response generation is available at [Granite Technical Report](https://github.com/ibm-granite/granite-3.0-language-models/blob/main/paper.pdf).
For creating the hallucination labels for responses, the technique available at [Achintalwar, et al.](https://arxiv.org/pdf/2403.06009) was used.
* [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
* [QuAC](https://huggingface.co/datasets/allenai/quac)
### Safety Exception Training Data
The following public datasets were used for finetuning.
* [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)
* [nvidia/Aegis-AI-Content-Safety-Dataset-1.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-1.0/viewer/default/train)
* A subset of [https://huggingface.co/datasets/Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
* Ibm/AttaQ
* [google/civil_comments](https://huggingface.co/datasets/google/civil_comments/blob/5cb696158f7a49c75722fd0c16abded746da3ea3/civil_comments.py)
* [allenai/social_bias_frames](https://huggingface.co/datasets/allenai/social_bias_frames)
## Model Card Authors
Kristjan Greenewald,
Nathalie Baracaldo,
Chulaka Gunasekara,
Lucian Popa,
Mandana Vaziri |