SentenceTransformer based on Qwen/Qwen3-0.6B-Base

This is a sentence-transformers model finetuned from Qwen/Qwen3-0.6B-Base. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen3-0.6B-Base
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: Qwen3Model 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "The ratio of an object's mass to its volume is its",
    'density.',
    '500 m',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 725,795 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 3 tokens
    • mean: 36.99 tokens
    • max: 128 tokens
    • min: 1 tokens
    • mean: 4.56 tokens
    • max: 34 tokens
  • Samples:
    sentence_0 sentence_1
    A balance can measure the weight of sugar
    The average monthly salary of 20 employees in an organisation is Rs. 1500. If the manager's salary is added, then the average salary increases by Rs. 100. What is the manager's monthly salary? Rs.3600
    When a baby shakes a rattle, it makes a noise. Which form of energy was changed to sound energy? mechanical
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.0110 500 1.3593
0.0220 1000 0.8335
0.0331 1500 0.7774
0.0441 2000 0.7507
0.0551 2500 0.7108
0.0661 3000 0.6946
0.0772 3500 0.6644
0.0882 4000 0.621
0.0992 4500 0.6124
0.1102 5000 0.576
0.1212 5500 0.5787
0.1323 6000 0.5502
0.1433 6500 0.5653
0.1543 7000 0.5315
0.1653 7500 0.5198
0.1764 8000 0.5114
0.1874 8500 0.4775
0.1984 9000 0.4803
0.2094 9500 0.4876
0.2204 10000 0.4824
0.2315 10500 0.4587
0.2425 11000 0.4521
0.2535 11500 0.4565
0.2645 12000 0.448
0.2756 12500 0.4475
0.2866 13000 0.4313
0.2976 13500 0.4226
0.3086 14000 0.4079
0.3196 14500 0.3869
0.3307 15000 0.4001
0.3417 15500 0.3815
0.3527 16000 0.3769
0.3637 16500 0.3526
0.3748 17000 0.3839
0.3858 17500 0.3647
0.3968 18000 0.3616
0.4078 18500 0.3615
0.4188 19000 0.3592
0.4299 19500 0.322
0.4409 20000 0.3352
0.4519 20500 0.3228
0.4629 21000 0.3213
0.4740 21500 0.3129
0.4850 22000 0.3086
0.4960 22500 0.3011
0.5070 23000 0.3112
0.5180 23500 0.308
0.5291 24000 0.3002
0.5401 24500 0.2805
0.5511 25000 0.2809
0.5621 25500 0.2666
0.5732 26000 0.2772
0.5842 26500 0.2783
0.5952 27000 0.2704
0.6062 27500 0.2696
0.6172 28000 0.2667
0.6283 28500 0.2561
0.6393 29000 0.2546
0.6503 29500 0.2491
0.6613 30000 0.2405
0.6724 30500 0.2376
0.6834 31000 0.2236
0.6944 31500 0.246
0.7054 32000 0.2418
0.7164 32500 0.2271
0.7275 33000 0.2308
0.7385 33500 0.2162
0.7495 34000 0.2135
0.7605 34500 0.2157
0.7716 35000 0.2177
0.7826 35500 0.2242
0.7936 36000 0.22
0.8046 36500 0.2026
0.8156 37000 0.1988
0.8267 37500 0.1845
0.8377 38000 0.1955
0.8487 38500 0.2115
0.8597 39000 0.2026
0.8708 39500 0.1861
0.8818 40000 0.1882
0.8928 40500 0.1861
0.9038 41000 0.1921
0.9148 41500 0.1778
0.9259 42000 0.1779
0.9369 42500 0.1782
0.9479 43000 0.1748
0.9589 43500 0.168
0.9700 44000 0.1717
0.9810 44500 0.1699
0.9920 45000 0.1697

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
37
Safetensors
Model size
596M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Matjac5/MNLP_M3_rag_model_old

Finetuned
(283)
this model