🎯 Custom Evaluation: Within-Family Generalization

Beyond the standard validation metrics, this model was subjected to a rigorous custom evaluation to test its ability to generalize to unseen sequences from known families. This is a critical test to ensure the model learned the underlying biological patterns of a protein family rather than simply memorizing the training examples.

Evaluation Set Construction

A custom test set was carefully constructed with the following properties:

  • Source: Sequences were drawn from the top 1,000 most common families (the same families the model was trained on).
  • No Overlap: A critical verification step ensured that 0 sequences from this test set were present in the original training data.
  • Balanced & Representative: The final test set contains 100 unique sequences from 75 different families, providing a balanced and challenging benchmark.

The full dataset used for this evaluation is available on the Hub here: QuantaFold-within-family-test. ### Astonishing Performance

The model demonstrated exceptional generalization capabilities, achieving outstanding results on this challenging, unseen data.

Metric Score
Accuracy 98.0%
Correct Predictions 49/50
Incorrect Predictions 1/50

Conclusion

This 98% accuracy on a completely novel set of sequences from within the training families proves that the model has successfully learned the robust, generalizable features that define a protein's functional identity. This high level of performance makes QuantaFold a reliable and powerful tool for scientific research.

Downloads last month
23
Safetensors
Model size
34M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Tarive/esm2_t12_35M_UR50D-finetuned-pfam-1k

Space using Tarive/esm2_t12_35M_UR50D-finetuned-pfam-1k 1