Instructions to use ctheodoris/Geneformer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ctheodoris/Geneformer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="ctheodoris/Geneformer")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("ctheodoris/Geneformer") model = AutoModelForMaskedLM.from_pretrained("ctheodoris/Geneformer") - Inference
- Notebooks
- Google Colab
- Kaggle
Some questions about gene type prediction task
Hi, congs for your great work! I take a look at the supp table for gene type prediction task, and I found that the second dataset is a little ambiguous. I cannot find that dataset(15K embryonic stem cells (ESCs)29) in PanglaoDB. Could you please offer more information? Thanks a lot.
Thank you for your interest in Geneformer. The dataset used for fine-tuning the model to distinguish bivalent promoters was from PanglaoDB, SRA553822-SRS2119548. In the example_input_files directory, we added the labels for the genes in the 56 highly conserved regions reported in Bernstein et al. 2006.
15K refers to the number of cells. [Update: we have stored the embryonic stem cell .dataset in the dataset repository: https://huggingface.co/datasets/ctheodoris/Genecorpus-30M/tree/main/example_input_files]
