MTL and insilico perturbation
Hi,
I came across warnings when doing insilico perturbation using MTL classifier:
Some weights of the model checkpoint at /model_saved/GeneformerMultiTask/ were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classification_heads.0.bias', 'classification_heads.0.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMaskedLM were not initialized from the model checkpoint at /home/jiaming/data2/Geneformer-v2/03_Results/MTL_Classfier_DDP/Heart-test/test1/model_saved/GeneformerMultiTask/ and are newly initialized: ['cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
It seems that model are not correctly loaded and drop weights which obtained from fine-tuning?
Codes be like:
emb = EmbExtractor(
model_type="CellClassifier",
num_classes=2,
emb_mode="cls",
max_ncells=None,
emb_layer=0,
forward_batch_size=64,
nproc=16,
summary_stat="exact_mean",
model_version="V2",
token_dictionary_file=token_dictionary_file
)
state_embs_dict=emb.get_state_embs(
cell_states_to_model=cell_states_to_model,
model_directory=model_directory,
input_data_file=input_data_file,
output_directory=output_dir_emb,
output_prefix="state_embs_dic",
output_torch_embs=False
)
isp = InSilicoPerturber(
perturb_type=perturb_type,
perturb_rank_shift=None,
genes_to_perturb='all',
combos=0,
anchor_gene=None,
model_type="MTLCellClassifier",
num_classes=2,
emb_mode="cls",
cell_states_to_model=cell_states_to_model,
state_embs_dict=state_embs_dict,
cell_inds_to_perturb={"start": start, "end": end},
max_ncells=None,
emb_layer=0,
forward_batch_size=64,
nproc=16,
model_version='V2',
token_dictionary_file=token_dictionary_file,
clear_mem_ncells=64
)
isp.perturb_data(
model_directory=model_directory,
input_data_file=input_data_file,
output_directory=output_dir_insilico,
output_prefix=output_prefix
)
The model I loaded was fine-tuned by a one-task V2 MTL classfier(classfy 2 cell states)
Thanks for your question. Since you are loading an MTL model as MaskedLM, this is expected so you can ignore this error.
Regarding your code, you should use Pretrained as the model type to generate the embeddings state embs dict to match what you use below with the in silico perturbation.
Thanks for your question. Since you are loading an MTL model as MaskedLM, this is expected so you can ignore this error.
Regarding your code, you should use Pretrained as the model type to generate the embeddings state embs dict to match what you use below with the in silico perturbation.
Thank you for your response. When running the insilico pipeline, I loaded a fine-tuned MTL classifier, so I’m unclear why we should specify “Pretrained” as the model type when generating the embs
state dictionary.
I’ve also tested the v2 model in DDP-MTL mode and noticed a slowdown. Although DDP usually speeds things up, v2 is much larger (~400 MB) than the previous 95 M-parameter, 12-layer model (145 MB). Could I use the original 95 M pretrained model within the v2 MTL framework, and if so, would you recommend doing that?
Wish you all well
Besides,You mentioned, “Since you are loading an MTL model as MaskedLM, this is expected so you can ignore this error.” However, I’m actually trying to use my fine-tuned MTL model for perturbations, so it looks like the perturbation step isn’t using my updated weights—has the pipeline dropped my fine-tuned parameters when generating embeddings?
The MTL heads are not used for the embeddings, so you can use Pretrained, as indicated in the documentation and examples. The weights are fine-tuned, but the heads are unneeded. The larger 316M parameter model is more computationally intensive. You may consider using the 104M parameter model, also provided in this repository.