Qwen3-Embedding-Scandi-0.6B
Fine-tuned version of Qwen/Qwen3-Embedding-0.6B for Scandinavian text embeddings (Danish, Norwegian, Swedish).
Model Summary
- Base model: Qwen/Qwen3-Embedding-0.6B
- Architecture: Transformer-based embedding model (0.6B parameters)
- Fine-tuning: LoRA + Swift, merged into base weights
- Task: Sentence and document embeddings for retrieval, clustering, and semantic similarity
- Languages: π©π° Danish, πΈπͺ Swedish, π³π΄ Norwegian
Intended Use
This model is intended for representation learning tasks such as:
- Semantic search
- Text clustering
- Document retrieval
- Reranking pipelines
Not recommended for text generation.
Training Details
- Dataset: DDSC/nordic-embedding-training-data Scandinavian corpora (mixed Danish, Norwegian, Swedish texts)
- Training framework: Swift with LoRA adapters
- Loss function: InfoNCE
Checkpoints
- LoRA weights merged into the base model.
- SafeTensors format used for efficiency.
- Tokenizer from base model copied for compatibility.
Limitations & Bias
- Limited to Scandinavian languages (other languages may work poorly).
- Embeddings are sensitive to domain shift (best results on text similar to training data).
- As with all language models, embeddings may encode societal biases present in the training data.
Acknowledgements
- Qwen Team for releasing the base model.
- Swift for training utilities.
- Weights & Biases for experiment tracking.
- DDSC for training data.
- Downloads last month
- 29
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support