Model Card for Apertus-8B-Instruct-OFAC-FAQ

A model fined tuned for sanctions and AML related OFAC FAQ questions with the Swiss AI Apertus 8B Instruct model which was then used as teacher and distilled to TinyLlama 1.1B. The model is 6-7 X smaller than the original. Quantization to INT8 should allow even low-memory CPU inference deployments if model latency is not a primary concern. PEFT LoRA adapter are included for use with base model.

Model Details

Model Description

The model includes INT8 quantized weights for CPU inference and a LoRA adapter for GPU inference with a matching base.

  • Developed by: Soteria Initiative
  • Funded by: Soteria Initiative
  • Shared by: Soteria Initiative
  • Model type: Text generation, LlamaForCausalLM, context length 2048
  • Language(s) (NLP): English, Others
  • License: Apache-2.0
  • Finetuned from model: Apertus 8B Instruct

Model Sources

Uses

Use for chat or assistant applications where compliance or financial crime analysis need to get answers regarding FATF or OFAC FAQ matters.

Direct Use

This model can directly be used with the FCCAssistant https://github.com/SoteriaInitiative/fccassistant once a model endpoint has been deployed.

Out-of-Scope Use

This model is not intended for production deployment.

Bias, Risks, and Limitations

The model is fine tuned for FATF and OFAC FAQ matters and hence should be restricted to such use cases where this is of a concern.

Recommendations

Perform model quality evaluation before use.

How to Get Started with the Model

Use the Jupyter Notebook linked in the Demo references for a comprehensive overview.

For a quick start try:

  import torch
  from transformers import AutoModelForCausalLM, AutoTokenizer
  from peft import PeftModel

  BASE = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
  ADAPTER = "./peft"  # or "org/repo-name" if pushed to HF

  # Tokenizer (includes the chat template)
  tokenizer = AutoTokenizer.from_pretrained(BASE)

  # Base model (GPU, 8-bit). For CPU, remove load_in_8bit and device_map.
  model = AutoModelForCausalLM.from_pretrained(
      BASE,
      device_map="auto",
      load_in_8bit=True,
  )
  model = PeftModel.from_pretrained(model, ADAPTER)
  model.eval()

  # Chat prompt via tokenizer's chat_template
  messages = [
      {"role": "system", "content": "You are a helpful assistant for sanctions/AML."},
      {"role": "user", "content": "Summarize the key OFAC FAQ topics."},
  ]
  inputs = tokenizer.apply_chat_template(
      messages, add_generation_prompt=True, return_tensors="pt"
  ).to(model.device)

  with torch.inference_mode():
      out = model.generate(
          inputs,
          max_new_tokens=256,
          temperature=0.7,
          top_p=0.9,
          do_sample=True,
          pad_token_id=tokenizer.eos_token_id,
      )

  print(tokenizer.decode(out[0], skip_special_tokens=True))

Notes:

  • GPU 8-bit is shown. For CPU-only, drop load_in_8bit=True and device_map="auto", then model.to("cpu").
  • If you plan to export a merged model, load the base in full precision and then model = model.merge_and_unload() (optional, not needed for standard PEFT inference).

Training Details

Training Data

The following sources where used for fine tuning:

Training Procedure

Supervised fine tuning has been applied to the Apertus 8B Instruct model with a training dataset of FAQ question/answer pairs as well as FATF titles and recommendation pairs.

Evaluation

Model evaluation has NOT been performed yet!

  • PEFT 0.13.2
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SoteriaInitiative/Apertus-8B-Instruct-OFAC-FAQ

Adapter
(1)
this model