emillykkejensen's picture
Upload folder using huggingface_hub
cf1e7ba verified
---
language:
- da
- no
- sv
tags:
- embeddings
- sentence-transformers
- scandinavian
- semantic-search
- retrieval
license: apache-2.0
---
# Qwen3-Embedding-Scandi-0.6B
[![Hugging Face](https://img.shields.io/badge/πŸ€—-Model%20Card-blue)](https://huggingface.co/emillykkejensen/Qwen3-Embedding-Scandi-0.6B)
Fine-tuned version of [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) for **Scandinavian text embeddings** (Danish, Norwegian, Swedish).
---
## Model Summary
* **Base model:** Qwen/Qwen3-Embedding-0.6B
* **Architecture:** Transformer-based embedding model (0.6B parameters)
* **Fine-tuning:** LoRA + Swift, merged into base weights
* **Task:** Sentence and document embeddings for retrieval, clustering, and semantic similarity
* **Languages:** πŸ‡©πŸ‡° Danish, πŸ‡ΈπŸ‡ͺ Swedish, πŸ‡³πŸ‡΄ Norwegian
---
## Intended Use
This model is intended for **representation learning** tasks such as:
* Semantic search
* Text clustering
* Document retrieval
* Reranking pipelines
Not recommended for **text generation**.
---
## Training Details
* **Dataset:** [DDSC/nordic-embedding-training-data](https://huggingface.co/datasets/DDSC/nordic-embedding-training-data) Scandinavian corpora (mixed Danish, Norwegian, Swedish texts)
* **Training framework:** [Swift](https://github.com/modelscope/swift) with LoRA adapters
* **Loss function:** InfoNCE
---
## Checkpoints
* LoRA weights merged into the base model.
* SafeTensors format used for efficiency.
* Tokenizer from base model copied for compatibility.
---
## Limitations & Bias
* Limited to **Scandinavian languages** (other languages may work poorly).
* Embeddings are sensitive to domain shift (best results on text similar to training data).
* As with all language models, embeddings may encode societal biases present in the training data.
---
## Acknowledgements
* [Qwen Team](https://huggingface.co/Qwen) for releasing the base model.
* [Swift](https://github.com/modelscope/swift) for training utilities.
* [Weights & Biases](https://wandb.ai) for experiment tracking.
* [DDSC](https://huggingface.co/DDSC) for training data.