π― Custom Evaluation: Within-Family Generalization
Beyond the standard validation metrics, this model was subjected to a rigorous custom evaluation to test its ability to generalize to unseen sequences from known families. This is a critical test to ensure the model learned the underlying biological patterns of a protein family rather than simply memorizing the training examples.
Evaluation Set Construction
A custom test set was carefully constructed with the following properties:
- Source: Sequences were drawn from the top 1,000 most common families (the same families the model was trained on).
- No Overlap: A critical verification step ensured that 0 sequences from this test set were present in the original training data.
- Balanced & Representative: The final test set contains 100 unique sequences from 75 different families, providing a balanced and challenging benchmark.
The full dataset used for this evaluation is available on the Hub here: QuantaFold-within-family-test. ### Astonishing Performance
The model demonstrated exceptional generalization capabilities, achieving outstanding results on this challenging, unseen data.
Metric | Score |
---|---|
Accuracy | 98.0% |
Correct Predictions | 49/50 |
Incorrect Predictions | 1/50 |
Conclusion
This 98% accuracy on a completely novel set of sequences from within the training families proves that the model has successfully learned the robust, generalizable features that define a protein's functional identity. This high level of performance makes QuantaFold a reliable and powerful tool for scientific research.
- Downloads last month
- 23