realaer's picture
Upload folder using huggingface_hub
c8dc454 verified
metadata
base_model: klue/roberta-base
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:7654
  - loss:CosineSimilarityLoss
widget:
  - source_sentence: 밥을 먹고 나서 운동하시겠어요, 먹기 전에 하시겠어요?
    sentences:
      - 제습기 조정하는 방법을 알려줘
      - 금요일에 놀러 가고 싶은지 토요일에 가고 싶은지 말해보겠니?
      - 이번에 임원들도 오시니 거래처 사람들과 만날  늦지 마세요.
  - source_sentence: >-
      올해 지원 대상에 선정된 42개사는 사업화 자금부터 사업화 촉진 진단, 민간투자 유치 등 기업 규모를 키울 수 있는 각종 지원을 최대
      15개월까지 받을 수 있다.
    sentences:
      - 체크인 아웃   소통이나 협조도도 매우 좋습니다
      - 작년 용평 지역 강설량은?
      - 긴급 사태가 선언된 7 도부현의 지사는 법적인 근거 아래 외출자제와 휴교 등을 요청할  있다.
  - source_sentence: 언제 할머니 칠순 잔치가 잡혀 있나요, 이번달입니까 다음달입니까?
    sentences:
      - 그리고 세탁세제와 식용유가 없으니 준비 하세요
      - 삼월에 태어난 친구 이름이 어떻게 됩니까?
      -   때는 다른 신발 말고 장화를 신었으면 합니다.
  - source_sentence: 한메일 서비스를 사용할  있는 기한이 언제일까요?
    sentences:
      - 우리는 코로나19와의 투쟁에서 개발도상국들을 지원해야  필요성을 인정한다.
      -   때는 높은지대에 텐트 치도록 해. 낮은 지대는 별로야.
      - 한메일은 언제 서비스를 종료해?
  - source_sentence: 오늘 제가 해야할 일이 무엇인가요!
    sentences:
      - 시내 중심에 위치한 깔끔하고 머무르기 좋은 숙소 입니다.
      - 가게로 들어가는  바로 옆에 오른쪽으로 올라가는 입구가 있어요.
      - 언제쯤 친구가 여행   있겠니?
model-index:
  - name: SentenceTransformer based on klue/roberta-base
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: pearson_cosine
            value: 0.3477070578392738
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.35560473197486514
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.36738467673522557
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.36460670798564826
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.36074511612166327
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.35482778401649034
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.21251170218646828
            name: Pearson Dot
          - type: spearman_dot
            value: 0.20063256899469895
            name: Spearman Dot
          - type: pearson_max
            value: 0.36738467673522557
            name: Pearson Max
          - type: spearman_max
            value: 0.36460670798564826
            name: Spearman Max
          - type: pearson_cosine
            value: 0.9611295434382598
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.922281644313147
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.95182850390749
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.9211213430736883
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.9519510086799272
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.9217056450919558
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.9503136478175895
            name: Pearson Dot
          - type: spearman_dot
            value: 0.9045157489205089
            name: Spearman Dot
          - type: pearson_max
            value: 0.9611295434382598
            name: Pearson Max
          - type: spearman_max
            value: 0.922281644313147
            name: Spearman Max

SentenceTransformer based on klue/roberta-base

This is a sentence-transformers model finetuned from klue/roberta-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: klue/roberta-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '오늘 제가 해야할 일이 무엇인가요!',
    '언제쯤 친구가 여행 갈 수 있겠니?',
    '시내 중심에 위치한 깔끔하고 머무르기 좋은 숙소 입니다.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.3477
spearman_cosine 0.3556
pearson_manhattan 0.3674
spearman_manhattan 0.3646
pearson_euclidean 0.3607
spearman_euclidean 0.3548
pearson_dot 0.2125
spearman_dot 0.2006
pearson_max 0.3674
spearman_max 0.3646

Semantic Similarity

Metric Value
pearson_cosine 0.9611
spearman_cosine 0.9223
pearson_manhattan 0.9518
spearman_manhattan 0.9211
pearson_euclidean 0.952
spearman_euclidean 0.9217
pearson_dot 0.9503
spearman_dot 0.9045
pearson_max 0.9611
spearman_max 0.9223

Training Details

Training Dataset

Unnamed Dataset

  • Size: 7,654 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 7 tokens
    • mean: 19.59 tokens
    • max: 58 tokens
    • min: 6 tokens
    • mean: 19.37 tokens
    • max: 55 tokens
    • min: 0.0
    • mean: 0.44
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    ‘인공지능 반도체 산업 발전전략’의 차질 없는 이행 및 성과점검을 위해 정부와 산·학·연이 참여하는 ‘인공지능 반도체 산업 전략회의’를 구성·운영한다. 정부, 산업계, 학계, 연구기관이 참여하는 '인공지능 반도체산업전략회의'를 구성하여 '인공지능 반도체산업 발전전략'의 성과를 점검할 예정입니다. 0.6
    예상했던대로 가성비 대비 최고의 위치였어요. 처음에 예상했던것보다 위치가 훨씬 좋았어요 0.54
    올해 처음 개최되는 투자유치설명회는 전문투자기관에 홍보할 기회를 얻기 힘든 1인 미디어 스타트업들의 민간 투자유치를 지원할 목적으로 마련됐다. 이번 발사는 저궤도위성에 이어 정지궤도위성에서 실시간으로 환경 감시 업무를 수행하는 세계 최초의 위성으로 기록됐다. 0.04
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 4
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss spearman_max
0 0 - 0.3646
1.0 479 - 0.9133
1.0438 500 0.0281 -
2.0 958 - 0.9181
2.0877 1000 0.006 0.9217
3.0 1437 - 0.9191
3.1315 1500 0.0036 -
4.0 1916 - 0.9223

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}