State-of-the-art Danish Models - a danish-foundation-models Collection

danish-foundation-models 's Collections

Papers

Danish Text Datasets

Danish Benchmarks

State-of-the-art Danish Models

updated Dec 23, 2024

These models constitute state-of-the-art models for Danish within their respective domain (highlighted below the model).

Upvote

google/gemma-2-9b-it

Text Generation • 9B • Updated Aug 27, 2024 • 165k • • 731

Note Best performing open-weight model on ScandEval, ~7-9b generative model which has been instruction-tuned. Rank on ScandEval Danish NLG (2024/12/14): 1.69
google/gemma-2-9b

Text Generation • 9B • Updated Aug 7, 2024 • 82.5k • • 664

Note Best performing open-weight model on ScandEval, ~7-9b generative model which hasn't been instruction-tuned. Rank on ScandEval Danish NLG (2024/12/14): 2.02
AI-Sweden-Models/roberta-large-1160k

Fill-Mask • 0.4B • Updated May 22 • 186 • • 11

Note Large-sized encoder. Rank on ScandEval Danish NLU (2024/12/14): 1.38
vesteinn/DanskBERT

Fill-Mask • 0.1B • Updated May 11, 2023 • 7 • • 6

Note Medium-sized encoder. Rank on ScandEval Danish NLU (2024/12/14): 1.56
ltg/norbert3-small

Fill-Mask • Updated May 27 • 11.5k • 2

Note Small-sized encoder. Rank on ScandEval Danish NLU (2024/12/14): 2.15
openai/whisper-large-v3

Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 4.58M • • 4.79k

Note Automatic speech recognition Word error rate on CoRal 28.3
jinaai/jina-embeddings-v3

Feature Extraction • 0.6B • Updated Feb 24 • 4.16M • 1.05k

Note Large-sized embedding model with flexible embedding sizes and long-document understanding.
syvai/hviske-v2

2B • Updated Oct 18, 2024 • 919 • 13

Note Automatic speech recognition based on Whisper 3 and fine-tuned on CoRal Word error rate on CoRal 11.8
intfloat/multilingual-e5-large-instruct

Feature Extraction • 0.6B • Updated Jul 10 • 2.07M • • 544

Note Large-sized embedding model with Instructions.
CoRal-project/roest-wav2vec2-315m-v1

Automatic Speech Recognition • 0.3B • Updated May 3 • 20 • 12

Note Speech Encoer (Wav2Vec2.0) Word error rate on CoRal 17.0
intfloat/multilingual-e5-large

Feature Extraction • 0.6B • Updated Feb 17 • 3.95M • • 1.01k

Note Large-sized embedding model.
intfloat/multilingual-e5-base

Sentence Similarity • 0.3B • Updated Feb 17 • 1.53M • • 292

Note Medium-sized embedding model.
intfloat/multilingual-e5-small

Sentence Similarity • 0.1B • Updated Feb 17 • 1.85M • • 230

Note Small-sized embedding model.
facebook/seamless-m4t-v2-large

Automatic Speech Recognition • 2B • Updated Jan 4, 2024 • 50.5k • 879

Note Machine translation (and other tasks)

Upvote