OODyssey

Team Members: Anatoly Buchin, Antoine Argante, Meriem Bensouda, Sreenath Srikrishnan

For the Tahoe-100m hackathon, we focused on building a set of increasingly difficult benchmarks and then tested how well existing and new models perform in them.

We decided to focus on this area because, while there is no shortage of model publications, it is currently unclear how well they generalize to unseen data and clinically relevant scenarios.

A lot of effort went into splitting the data up in a thoughtful way: We created several categories of held-out data, which were increasingly more out-of-distribution (OOD). From easiest to hardest:

  1. Plate 14, which is a replica of other data from other plates, should be identical to some of the data the model has been trained on, except for technical variation effects.
  2. Drugs (Cell lines) where the model has seen other drugs (cell lines) from the same mechanism of action (organ)
  3. Drugs (Cell lines) where the mechanism of action (organ) is completely novel to the model
  4. External datasets (like Sciplex3, TCGA) that share some drugs but will contain a lot of technical variation compared to Tahoe
  5. Drug combination datasets (extremely hard): We found one dataset (GSE206741) that combines two of the drugs found in Tahoe. To do well on this test set, the model must not only understand the effect of drugs on their own but also their interactions.

We then benchmarked different models against these test sets:

For a simple baseline, we used PCA to embed the data and then ran a logistic regression to predict the organ/drug label. Other models we trained and compared were Transcriptformer, constrastiveVI and scVI.

As an example, here are the results of predicting the organ of a held-out cell line when the other cell lines of that organ were included in the training data:

image/png

We found that Transcriptformer (a zero-shot model) does well on held out cell lines, also for fully held-out organs (perhaps it has seen similar training data or has generalized well). In contrast to this, the other models did not show significant improvements over the simple baseline.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train cgoeldel/OODyssey