SentenceTransformer based on Snowflake/snowflake-arctic-embed-s

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-s. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-s
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LucaZilli/model-snowflake-s_20250226_145351_finalmodel")
# Run inference
sentences = [
    'materiali isolanti per sistemi radianti a soffitto',
    'materiali isolanti per edifici',
    'privacy and data protection training',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric custom_dataset stsbenchmark
pearson_cosine 0.7037 0.7477
spearman_cosine 0.7287 0.7432

Triplet

Metric Value
cosine_accuracy 0.8163

Training Details

Training Dataset

Unnamed Dataset

  • Size: 25,310 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 13.32 tokens
    • max: 31 tokens
    • min: 4 tokens
    • mean: 11.06 tokens
    • max: 31 tokens
    • min: 0.0
    • mean: 0.49
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    ottimizzazione dei tempi di produzione per capi sartoriali di lusso strumenti per l'ottimizzazione dei tempi di produzione 0.6
    software di programmazione robotica per lucidatura software gestionale generico 0.4
    rete di sensori per l'analisi del suolo in tempo reale software per gestione aziendale 0.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 3,164 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 13.61 tokens
    • max: 31 tokens
    • min: 4 tokens
    • mean: 11.39 tokens
    • max: 27 tokens
    • min: 0.0
    • mean: 0.49
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    ispezioni regolari per camion aziendali ispezioni regolari per camion di consegna 1.0
    blister packaging machines GMP compliant food packaging machines 0.4
    EMI shielding paints for electronics Vernici per schermatura elettromagnetica dispositivi elettronici 0.8
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss custom_dataset_spearman_cosine all_nli_dataset_cosine_accuracy stsbenchmark_spearman_cosine
-1 -1 - - 0.7287 0.8163 0.7432
0.1264 200 0.0671 0.0434 - - -
0.2528 400 0.0401 0.0344 - - -
0.3793 600 0.0342 0.0307 - - -
0.5057 800 0.0347 0.0327 - - -
0.6321 1000 0.0322 0.0287 - - -
0.7585 1200 0.032 0.0279 - - -
0.8850 1400 0.0307 0.0282 - - -
1.0114 1600 0.0267 0.0279 - - -
1.1378 1800 0.0244 0.0266 - - -
1.2642 2000 0.0227 0.0282 - - -
1.3906 2200 0.0237 0.0249 - - -
1.5171 2400 0.0222 0.0273 - - -
1.6435 2600 0.0235 0.0246 - - -
1.7699 2800 0.0228 0.0247 - - -
1.8963 3000 0.0225 0.0241 - - -
2.0228 3200 0.0213 0.0244 - - -
2.1492 3400 0.0169 0.0234 - - -
2.2756 3600 0.0178 0.0257 - - -
2.4020 3800 0.018 0.0236 - - -
2.5284 4000 0.0177 0.0230 - - -
2.6549 4200 0.0176 0.0234 - - -
2.7813 4400 0.0182 0.0229 - - -
2.9077 4600 0.0173 0.0221 - - -
3.0341 4800 0.0157 0.0232 - - -
3.1606 5000 0.0139 0.0225 - - -
3.2870 5200 0.0137 0.0222 - - -
3.4134 5400 0.0142 0.0224 - - -
3.5398 5600 0.0143 0.0224 - - -
3.6662 5800 0.0135 0.0225 - - -
3.7927 6000 0.0143 0.0223 - - -
3.9191 6200 0.0143 0.0234 - - -
4.0455 6400 0.0128 0.0219 - - -
4.1719 6600 0.0117 0.0222 - - -
4.2984 6800 0.0113 0.0217 - - -
4.4248 7000 0.0115 0.0220 - - -
4.5512 7200 0.012 0.0217 - - -
4.6776 7400 0.0113 0.0221 - - -
4.8040 7600 0.012 0.0217 - - -
4.9305 7800 0.0105 0.0217 - - -

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
10
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LucaZilli/model-snowflake-s_20250226_145351_finalmodel

Finetuned
(6)
this model

Evaluation results