Model Card for Apertus-8B-Instruct-OFAC-FAQ
A model fined tuned for sanctions and AML related OFAC FAQ questions with the Swiss AI Apertus 8B Instruct model which was then used as teacher and distilled to TinyLlama 1.1B. The model is 6-7 X smaller than the original. Quantization to INT8 should allow even low-memory CPU inference deployments if model latency is not a primary concern. PEFT LoRA adapter are included for use with base model.
Model Details
Model Description
The model includes INT8 quantized weights for CPU inference and a LoRA adapter for GPU inference with a matching base.
- Developed by: Soteria Initiative
- Funded by: Soteria Initiative
- Shared by: Soteria Initiative
- Model type: Text generation, LlamaForCausalLM, context length 2048
- Language(s) (NLP): English, Others
- License: Apache-2.0
- Finetuned from model: Apertus 8B Instruct
Model Sources
- Repository: https://huggingface.co/SoteriaInitiative/Apertus-8B-Instruct-OFAC-FAQ
- Demo: WIP
Uses
Use for chat or assistant applications where compliance or financial crime analysis need to get answers regarding FATF or OFAC FAQ matters.
Direct Use
This model can directly be used with the FCCAssistant https://github.com/SoteriaInitiative/fccassistant once a model endpoint has been deployed.
Out-of-Scope Use
This model is not intended for production deployment.
Bias, Risks, and Limitations
The model is fine tuned for FATF and OFAC FAQ matters and hence should be restricted to such use cases where this is of a concern.
Recommendations
Perform model quality evaluation before use.
How to Get Started with the Model
Use the Jupyter Notebook linked in the Demo references for a comprehensive overview.
For a quick start try:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
ADAPTER = "./peft" # or "org/repo-name" if pushed to HF
# Tokenizer (includes the chat template)
tokenizer = AutoTokenizer.from_pretrained(BASE)
# Base model (GPU, 8-bit). For CPU, remove load_in_8bit and device_map.
model = AutoModelForCausalLM.from_pretrained(
BASE,
device_map="auto",
load_in_8bit=True,
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()
# Chat prompt via tokenizer's chat_template
messages = [
{"role": "system", "content": "You are a helpful assistant for sanctions/AML."},
{"role": "user", "content": "Summarize the key OFAC FAQ topics."},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
with torch.inference_mode():
out = model.generate(
inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Notes:
- GPU 8-bit is shown. For CPU-only, drop load_in_8bit=True and device_map="auto", then model.to("cpu").
- If you plan to export a merged model, load the base in full precision and then model = model.merge_and_unload() (optional, not needed for standard PEFT inference).
Training Details
Training Data
The following sources where used for fine tuning:
- OFAC FAQ: https://ofac.treasury.gov/faqs
- FATF Recommendations: https://www.fatf-gafi.org/content/dam/fatf-gafi/recommendations/FATF%20Recommendations%202012.pdf.coredownload.inline.pdf
Training Procedure
Supervised fine tuning has been applied to the Apertus 8B Instruct model with a training dataset of FAQ question/answer pairs as well as FATF titles and recommendation pairs.
Evaluation
Model evaluation has NOT been performed yet!
- PEFT 0.13.2
- Downloads last month
- -
Model tree for SoteriaInitiative/Apertus-8B-Instruct-OFAC-FAQ
Base model
swiss-ai/Apertus-8B-2509