--- base_model: microsoft/phi-4 library_name: peft model_name: peleke-phi-4 tags: - base_model:adapter:microsoft/phi-4 - lora - sft - transformers - trl - chemistry - biology - antibody - antigen - protein - amino-acid - drug-design licence: gpl-3 pipeline_tag: text-generation license: gpl-3.0 datasets: - silicobio/peleke_antibody-antigen_sabdab --- # Model Card for peleke-phi-4 This model is a fine-tuned version of [microsoft/phi-4](https://huggingface.co/microsoft/phi-4) for antibody sequence generation. It takes in an antigen sequence, and returns novel Fv portions of heavy and light chain antibody sequences. ## Quick start 1. Load in the Model ```python model_name = 'silicobio/peleke-phi-4' config = PeftConfig.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda() model.resize_token_embeddings(len(tokenizer)) model = PeftModel.from_pretrained(model, model_name).cuda() ``` 2. Format your Input This model uses `` and `` to annotate epitope residues of interest. It may be easier to use other characters for annotation, such as `[ ]`'s. For example: `...CSFS[S][F][V]L[N]WY...`. Then, use the following function to properly format the input. ```python def format_prompt(antigen_sequence): epitope_seq = re.sub(r'\[([A-Z])\]', r'\1', antigen_sequence) formatted_str = f"Antigen: {epitope_seq}<|im_end|>\nAntibody:" return formatted_str ``` 3. Generate an Antibody Sequence ```python prompt = format_prompt(antigen) inputs = tokenizer(prompt, return_tensors="pt") inputs = {k: v.cuda() for k, v in inputs.items()} with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=1000, do_sample=True, temperature=0.7, pad_token_id=tokenizer.eos_token_id, use_cache=False, ) full_text = tokenizer.decode(outputs[0], skip_special_tokens=False) antibody_sequence = full_text.split('<|im_end|>')[1].replace('Antibody: ', '') print(f"Antigen: {antigen}\nAntibody: {antibody_sequence}\n") ``` This will generate a `|`-delimited output, which is an Fv portion of a heavy and light chain. ```sh Antigen: NPPTFSPALL... Antibody: QVQLVQSGGG...|DIQMTQSPSS... ``` ## Training procedure This model was trained with SFT. ### Framework versions - PEFT 0.17.0 - TRL: 0.19.1 - Transformers: 4.54.0 - Pytorch: 2.7.1 - Datasets: 4.0.0 - Tokenizers: 0.21.2