a-m-team/AM-DeepSeek-R1-Distilled-1.4M Viewer • Updated about 11 hours ago • 475k • 2.98k • 73
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing Paper • 2111.09543 • Published Nov 18, 2021 • 3
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group • 21 items • Updated Jan 21 • 26