BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP Paper โข 2506.10896 โข Published 2 days ago โข 1 โข 2
BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP Paper โข 2506.10896 โข Published 2 days ago โข 1
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Paper โข 2506.10952 โข Published 2 days ago โข 20
Institutional Books Collection A growing corpus of public domain books from library collections, seeded by Harvard Library. โข 3 items โข Updated 3 days ago โข 1
institutional/institutional-books-topic-classifier-bert Text Classification โข Updated 3 days ago โข 20 โข 4
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper โข 2506.08300 โข Published 5 days ago โข 6 โข 3
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper โข 2506.08300 โข Published 5 days ago โข 6
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper โข 2506.08300 โข Published 5 days ago โข 6 โข 3