Heavy2Light
Heavy2Light is an seq2seq model designed to generate light chain antibody sequences from corresponding heavy chain inputs. It leverages HeavyBERTa as the encoder and LightGPT as the decoder. The model is fine-tuned on paired antibody chain data from the OAS and PLAbDab databases. The model utilizes Adapters for efficient fine-tuning. You can either download the full model weights and adapter from this repository, or directly use the Heavy2Light adapter available in its dedicated directory on Hugging Face.
For more information, please visit our GitHub repository.
How to use the model
from transformers import EncoderDecoderModel, AutoTokenizer, GenerationConfig
from adapters import init
model_path = "leaBroe/Heavy2Light"
subfolder_path = "heavy2light_final_checkpoint"
model = EncoderDecoderModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, subfolder=subfolder_path)
init(model)
adapter_name = model.load_adapter("leaBroe/Heavy2Light_adapter", set_active=True)
model.set_active_adapters(adapter_name)
generation_config = GenerationConfig.from_pretrained(model_path)
# example input heavy sequence
heavy_seq = "QLQVQESGPGLVKPSETLSLTCTVSGASSSIKKYYWGWIRQSPGKGLEWIGSIYSSGSTQYNPALGSRVTLSVDTSQTQFSLRLTSVTAADTATYFCARQGADCTDGSCYLNDAFDVWGRGTVVTVSS"
inputs = tokenizer(
heavy_seq,
padding="max_length",
truncation=True,
max_length=250,
return_tensors="pt"
)
generated_seq = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
num_return_sequences=1,
output_scores=True,
return_dict_in_generate=True,
generation_config=generation_config,
bad_words_ids=[[4]],
do_sample=True,
temperature=1.0,
)
generated_text = tokenizer.decode(
generated_seq.sequences[0],
skip_special_tokens=True,
)
print("Generated light sequence:", generated_text)
- Downloads last month
- 5
Model tree for leaBroe/Heavy2Light
Base model
leaBroe/HeavyBERTa