GSFM

Trained on millions of gene sets automatically extracted from literature and raw RNA-seq data, GSFM learns to recover held-out genes from gene sets. The resulting model exhibits state of the art performance on gene function prediction.

Website

https://gsfm.maayanlab.cloud/

Usage

# install gsfm python library from its source on huggingface
GIT_LFS_SKIP_SMUDGE=1 pip install git+https://huggingface.co/maayanlab/gsfm
import torch
from gsfm import Vocab, GSFM

# load gsfm vocabulary and model weights
vocab = Vocab.from_pretrained('maayanlab/gsfm')
gsfm = GSFM.from_pretrained('maayanlab/gsfm')

# convert gene symbols into token ids
token_ids = torch.tensor(vocab(['ACE1', 'ACE2']))[None, :]

# use model to predict missing genes from the set
logits = torch.squeeze(gsfm(token_ids))
top_10 = sorted(zip(logits, vocab.vocab))[-10:]
top_10

# get gene embedding
gene_embeddings = gsfm.embedding(token_ids)
gene_embeddings

# get model middle layer
gene_set_encoding = gsfm.encode(token_ids)
gene_set_encoding
Downloads last month
29
Safetensors
Model size
15.1M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support