Model Card for peleke-llama-3.1-8b-instruct

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct for antibody sequence generation. It takes in an antigen sequence, and returns novel Fv portions of heavy and light chain antibody sequences.

Quick start

  1. Load in the Model
model_name = 'peleke-llama-3.1-8b-instruct'
config = PeftConfig.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, model_name).cuda()
  1. Format your Input

This model uses <epi> and </epi> to annotate epitope residues of interest.

It may be easier to use other characters for annotation, such as [ ]'s. For example: ...CSFS[S][F][V]L[N]WY.... Then, use the following function to properly format the input.

def format_prompt(antigen_sequence):
    epitope_seq = re.sub(r'\[([A-Z])\]', r'<epi>\1</epi>', antigen_sequence)
    formatted_str = f"Antigen: {epitope_seq}<|im_end|>\nAntibody:"
    return formatted_str
  1. Generate an Antibody Sequence
prompt = format_prompt(antigen)
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1000,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id,
        use_cache=False,
    )

full_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
antibody_sequence = full_text.split('<|im_end|>')[1].replace('Antibody: ', '')
print(f"Antigen: {antigen}\nAntibody: {antibody_sequence}\n")

This will generate a |-delimited output, which is an Fv portion of a heavy and light chain.

Antigen: NPPTFSPALL...
Antibody: QVQLVQSGGG...|DIQMTQSPSS...

Training procedure

This model was trained with SFT.

Framework versions

  • PEFT 0.17.0
  • TRL: 0.19.1
  • Transformers: 4.54.0
  • Pytorch: 2.7.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for silicobio/peleke-llama-3.1-8b-instruct

Adapter
(1166)
this model

Dataset used to train silicobio/peleke-llama-3.1-8b-instruct

Collection including silicobio/peleke-llama-3.1-8b-instruct