File size: 7,375 Bytes
797a14d 7c5ee69 4f62d2a 9b93a04 710a572 4c87fc3 cb64953 9b93a04 8c34d4d 16df781 9b93a04 710a572 9b93a04 a81af94 16df781 9b93a04 c12b960 44b333e c12b960 44b333e c12b960 44b333e c12b960 44b333e c12b960 9f1e683 c12b960 68b99e4 9b93a04 a407a3b c12b960 9b93a04 c12b960 9b93a04 c12b960 1f3e2f6 a81af94 cb79d06 1f3e2f6 c12b960 93e2e47 c12b960 599b440 c12b960 93e2e47 5ecbe5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
---
license: apache-2.0
language:
- en
---
March 23, 2024
# Felix-8B: A model built with lawfulness alignment
Felix-8B is an experimental language model developed by Ontocord.ai, specializing in addressing lawfulness concerns under the Biden-Harris Executive Order on AI and the principles of the EU AI Act. This model has achieved one of the highest scores on the TruthfulQA benchmark compared to models of its size, showcasing its exceptional performance in providing accurate and reliable responses.
Felix-8B is **experimental and a research work product** and a DPO reinforcement learning version of [ontocord/sft-4e-exp2](https://huggingface.co/ontocord/sft-4e-exp2) which in turn is a fine-tuned version of [TencentARC/Mistral_Pro_8B_v0.1](https://huggingface.co/TencentARC/Mistral_Pro_8B_v0.1).
Felix-8B was DPO trained on our synthetically generated dataset [Auto Redteam Triplets (ART): a synthetic dataset to perform reinforcement learning redteaming for the EU AI Act and Biden-Harris AI Executive Order concerns](ontocord/auto_redteam_triplets).
**If you find any issues please let us know in the Community section so we can improve the model.**
![image/png](https://cdn-uploads.huggingface.co/production/uploads/630430583926de1f7ec62c6b/zKoEI-B21KnmeRWvKzrx1.png)
## Model Description
Felix-8B is an 8 billion parameter language model trained using Ontocord.ai's proprietary auto-purpleteaming technique. The model has been fine-tuned and optimized using synthetic data, with the goal of improving its robustness and ability to handle a wide range of tasks while maintaining a strong focus on safety and truthfulness.
## Evaluation
| arc (25) | hellaswag (10) | mmlu (5) | truthfulqa_mc (20) | winogrande | gsm8k (exact) | avg (exact) |
|:--------:|:--------------:|:--------:|:------------------:|:----------:|:-------------:|:-----------:|
| 0.6519 | 0.8474 | 0.6057 | 0.638 | 0.764 | 0.5997 | 0.68445 |
Felix-8B has been extensively evaluated on various benchmarks, including the TruthfulQA benchmark, where it has achieved one of the highest scores among models of its size.
We also confirmed that the model increased its helpfulness during DPO, which is very encouraging as we did not specifically target helpfulness, but rather harmlessness.
More evaluations will be provided as we further continue our research.
## Redteaming Considerations
Our redteam focuses on specific areas of concern under the executive order, including:
* Harm to oneself or others.
* Requests on how to create cyber-attacks.
* Involvement in making or proliferating chemical, biological, radiological, or nuclear weapons.
* Participation in any illegal act.
* Infringement of privacy or rights.
* Attempts to circumvent redteam controls.
Our redteam focuses on the potential coverage of the EU AI Act, including:
* Misinformation
* Social Scoring
* Emotion Monitoring and Control in Schools and Workplaces
* Predictive Policing
* Classification of People Based on Sensitive Characteristics
We also address traditional concerns related to AI Safety:
* Toxic language
* Fairness
* Stereotyping and Discrimination
* Adversarial Prompting
* Out of Distribution Robustness
* Machine Ethics
* Avoidance of unqualified professional advice.
## Dataset
The SFT model [ontocord/sft-4e-exp2](https://huggingface.co/ontocord/sft-4e-exp2) was trained with a subset of the execellent [OpenHermes 2.5 dataset](https://huggingface.co/datasets/teknium/OpenHermes-2.5).
The DPO reinforcement learning uses an Ontocord synthetic auto-purpletteaming dataset. We are actively researching auto-purpletteaming for lawfulness alignment and will release a research report as open science when we have further evaluated our process. An example of our dataset entry is as follows:
```
<|im_start|>user
Explain it simply for someone who doesn't understand much about the topic. Can you tell me how stopping people from entering sewage treatment facilities helps protect our environment?
<|im_end|>
<|im_start|>assistant
Sure! Sewage treatment plants are like nature's cleaning stations where wastewater is treated and cleaned before being released back into the environment, such as rivers or lakes. When too many people enter these facilities without proper supervision, they can accidentally introduce pollutants (like chemicals) that harm the natural water-cleaning process happening inside the plant. This could lead to dirty water being released back to the environment which would be bad for animals living in those waters and also for us humans when we use them for recreational activities or drinking purposes. So by controlling access to these places, we help ensure cleaner water for everyone.
```
## Limitations and Biases
While Felix-8B has been designed with safety and truthfulness as its primary focus, it is essential to acknowledge that no language model is entirely free from biases or limitations. Users should be aware that the model's outputs may be influenced by the data it was trained on and the inherent biases present in the training data.
Additionally, Felix-8B's performance may vary depending on the specific task and domain. Users should exercise caution when applying the model to sensitive or high-stakes situations and always verify the model's outputs with human oversight.
Felix-8B is **experimental and a research work product**. We have found for example, the model may be overly verbose which may be due to our SFT or the TencentARC/Mistral_Pro_8B_v0.1 model. Also, we are actively working on improving the training dataset. We have for example discovered some answers are too short in our dataset, or may have typos due to our use of a quantized generative model to create synthetic triples for DPO.
## Ethical Considerations
Ontocord.ai is committed to developing AI technologies that prioritize lawfulness and equal access. Felix-8B has been designed with these principles in mind, and efforts have been made to mitigate potential risks and biases during the training process.
However, users should be aware of the potential misuse of language models and ensure that Felix-8B is used in compliance with ethical guidelines and applicable laws and regulations.
## How to Use
To use Felix-8B, you can access the model through the Hugging Face Model Hub:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ontocord/Felix-8B")
model = AutoModelForCausalLM.from_pretrained("ontocord/Felix-8B")
```
Instruction format:
```
<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
Hello, who are you?<|im_end|>
<|im_start|>assistant
```
## Acknowledgement
We want to thank [Teknium](https://huggingface.co/teknium) and [Nous Research](https://huggingface.co/NousResearch) for the excellent work in performant models. We would also like to thank our friends from [DFKI](https://www.dfki.de/en/web/research) and our friends Simone Tedeschi and Felix Friedrich for discussions on auto-purpleteaming.
## Citation
To cite Felix-8B, please use:
```
@misc{ontocord2024Felix-8B,
title={Felix-8B: A model built with lawfulness alignment },
author={Quan Nguyen, Vincent Hines, Huu Nguyen, Patrick Schramowski},
year={2024},
}
```
** We will update the contributor list as we further develop this research.
|