EuroBERT 🇪🇺 Scaling Multilingual Encoders for European Languages. EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 81 EuroBERT/EuroBERT-210m Fill-Mask • Updated Apr 17 • 18.5k • 70 EuroBERT/EuroBERT-610m Fill-Mask • Updated Apr 17 • 6.98k • 29 EuroBERT/EuroBERT-2.1B Fill-Mask • Updated Apr 17 • 1.13k • 51
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 81
ULD Loss (Universal LLMs Distillation) The ULD loss, based on optimal transport, enables distillation across different LLM families without requiring shared tokenizers. Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper • 2402.12030 • Published Feb 19, 2024 mistralai/Mistral-7B-Instruct-v0.2 Text Generation • Updated Sep 27, 2024 • 1.75M • • 2.83k meta-llama/Llama-2-7b-chat-hf Text Generation • Updated Apr 17, 2024 • 1.08M • 4.47k EleutherAI/pythia-160m-deduped Text Generation • Updated Jul 9, 2023 • 54.7k • 3
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper • 2402.12030 • Published Feb 19, 2024
EuroBERT 🇪🇺 Scaling Multilingual Encoders for European Languages. EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 81 EuroBERT/EuroBERT-210m Fill-Mask • Updated Apr 17 • 18.5k • 70 EuroBERT/EuroBERT-610m Fill-Mask • Updated Apr 17 • 6.98k • 29 EuroBERT/EuroBERT-2.1B Fill-Mask • Updated Apr 17 • 1.13k • 51
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 81
ULD Loss (Universal LLMs Distillation) The ULD loss, based on optimal transport, enables distillation across different LLM families without requiring shared tokenizers. Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper • 2402.12030 • Published Feb 19, 2024 mistralai/Mistral-7B-Instruct-v0.2 Text Generation • Updated Sep 27, 2024 • 1.75M • • 2.83k meta-llama/Llama-2-7b-chat-hf Text Generation • Updated Apr 17, 2024 • 1.08M • 4.47k EleutherAI/pythia-160m-deduped Text Generation • Updated Jul 9, 2023 • 54.7k • 3
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper • 2402.12030 • Published Feb 19, 2024
Nicolas-BZRD/mt0-base_dialogsum_Mistral-7B-Instruct-v0.2_uld_loss Text2Text Generation • Updated Feb 19, 2024 • 12
Nicolas-BZRD/mt0-base_dialogsum_Mistral-7B-Instruct-v0.2_text_teacher Text2Text Generation • Updated Feb 19, 2024 • 15
Nicolas-BZRD/mt0-base_pubmed_qa_Llama-2-7b-chat-hf_uld_loss Text2Text Generation • Updated Feb 19, 2024 • 17
Nicolas-BZRD/mt0-base_pubmed_qa_Llama-2-7b-chat-hf_text_teacher Text2Text Generation • Updated Feb 19, 2024 • 91
Nicolas-BZRD/mt0-base_qed_Llama-2-7b-chat-hf_uld_loss Text2Text Generation • Updated Feb 19, 2024 • 14
Nicolas-BZRD/mt0-base_qed_Llama-2-7b-chat-hf_text_teacher Text2Text Generation • Updated Feb 19, 2024 • 13
Nicolas-BZRD/mt0-base_dialogsum_Llama-2-7b-chat-hf_uld_loss Text2Text Generation • Updated Feb 19, 2024 • 44
Nicolas-BZRD/mt0-base_dialogsum_Llama-2-7b-chat-hf_text_teacher Text2Text Generation • Updated Feb 19, 2024 • 13
Nicolas-BZRD/pythia-160m-deduped_FairytaleQA_Llama-2-7b-chat-hf_uld_loss Text Generation • Updated Feb 19, 2024 • 20
Nicolas-BZRD/pythia-160m-deduped_FairytaleQA_Llama-2-7b-chat-hf_text_teacher Text Generation • Updated Feb 19, 2024 • 19
Nicolas-BZRD/uld_loss_Mistral-7B-Instruct-v0.2-pubmed_qa_50k Viewer • Updated Mar 13, 2024 • 50.5k • 17
Nicolas-BZRD/uld_loss_Mistral-7B-Instruct-v0.2-squad Viewer • Updated Mar 13, 2024 • 87.6k • 31