baconnier's picture
Add new SentenceTransformer model.
7af9727 verified
metadata
language: []
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:15525
  - loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-small-en-v1.5
datasets:
  - baconnier/finance_dataset_small_private
metrics:
  - cosine_accuracy
  - dot_accuracy
  - manhattan_accuracy
  - euclidean_accuracy
  - max_accuracy
widget:
  - source_sentence: >-
      What is the loan to value ratio (LTV) for Samantha's mortgage, and how
      does it relate to the definition of LTV?
    sentences:
      - >-
        Loan amount = Home value - Down payment

        Loan amount = $300,000 - $60,000 = $240,000

        LTV = Loan amount ÷ Home value

        LTV = $240,000 ÷ $300,000 = 0.8 or 80%

        The LTV is the proportion of the property's value financed by the loan.

        The LTV for Samantha's mortgage is 80%, which aligns with the definition
        of LTV as the proportion of the property's value financed by the loan.
      - |-
        LTV = Down payment ÷ Home value
        LTV = $60,000 ÷ $300,000 = 0.2 or 20%
        The LTV for Samantha's mortgage is 20%.
      - What is a sale in the context of securities trading?
  - source_sentence: >-
      What is greenmail, and how does it differ from a typical stock
      acquisition?
    sentences:
      - >-
        Greenmail is when a company buys a small amount of stock in another
        company. This is different from a normal stock purchase because the
        amount is small.

        Greenmail is a small stock purchase, unlike a typical acquisition.
      - >-
        Greenmail is a corporate finance tactic where an unfriendly entity
        acquires a large block of a target company's stock, intending to force
        the target company to buy back the shares at a significant premium to
        prevent a hostile takeover. This differs from a typical stock
        acquisition, which is usually done for investment purposes or to gain a
        smaller ownership stake, without the explicit intention of forcing a
        buyback or threatening a takeover.

        Greenmail is a tactic used by an unfriendly entity to force a target
        company to buy back its shares at a premium to prevent a hostile
        takeover, while a typical stock acquisition is done for investment or to
        gain a smaller ownership stake without the intention of forcing a
        buyback or threatening a takeover.
      - >-
        What is the process of 'circling' in the context of underwriting a new
        share issue?
  - source_sentence: >-
      ISOs are not taxed at grant or exercise. If shares are held for 2 years
      from grant and 1 year from exercise, the profit is taxed as long-term
      capital gain. If holding periods are not met, it's a disqualifying
      disposition, and the profit is taxed as ordinary income.

      ISOs are tax-free at grant and exercise. Profit is taxed as capital gain
      or ordinary income based on holding periods.
    sentences:
      - >-
        Incentive Stock Options have no tax benefits and are taxed as ordinary
        income when exercised.

        ISOs are taxed as ordinary income when exercised.
      - >-
        What are the key characteristics of Incentive Stock Options (ISOs) in
        terms of taxation?
      - What is a short squeeze, and how does it affect stock prices?
  - source_sentence: >-
      What is a sell order, and how does it relate to Maggie's decision to sell
      her XYZ Corporation shares?
    sentences:
      - >-
        A performance fund is a growth-oriented mutual fund that invests
        primarily in stocks of companies with high growth potential and low
        dividend payouts. These funds are typically associated with higher risk
        compared to other types of mutual funds. For example, balanced funds
        invest in a mix of stocks and bonds and have a more moderate risk
        profile, while money market funds invest in low-risk, short-term
        securities and offer lower returns. Performance funds aim for higher
        capital appreciation but come with increased volatility.

        A performance fund is a high-risk, growth-oriented mutual fund that
        invests in stocks with high growth potential and low dividends, aiming
        for capital appreciation. It differs from balanced funds (moderate risk,
        mix of stocks and bonds) and money market funds (low risk, short-term
        securities, lower returns).
      - >-
        A sell order is when you want to buy shares of a stock. Maggie wanted to
        sell her XYZ shares because the price was going up.

        Maggie placed a sell order to buy XYZ shares since the price was
        increasing.
      - >-
        A sell order is an instruction given by an investor to a broker to sell
        a specific financial asset at a certain price or market condition.
        Maggie placed a sell order for 1,000 shares of XYZ Corporation at a
        limit price of $50 per share because she believed the company's recent
        acquisition announcement would negatively impact the stock price in the
        short term.

        Maggie placed a sell order to sell 1,000 XYZ shares at $50 or higher due
        to her expectation of a short-term price decline following the company's
        acquisition announcement.
  - source_sentence: >-
      What is industrial production, and how is it measured by the Federal
      Reserve Board?
    sentences:
      - >-
        What is triangular arbitrage, and how does it allow traders to profit
        from price discrepancies across three different markets?
      - >-
        Industrial production is a statistic that measures the output of
        factories and mines in the US. It is released by the Federal Reserve
        Board every quarter.

        Industrial production measures factory and mine output, released
        quarterly by the Fed.
      - >-
        Industrial production is a statistic determined by the Federal Reserve
        Board that measures the total output of all US factories and mines on a
        monthly basis. The Fed collects data from various government agencies
        and trade associations to calculate the industrial production index,
        which serves as an important economic indicator, providing insight into
        the health of the manufacturing and mining sectors.

        Industrial production is a monthly statistic calculated by the Federal
        Reserve Board, measuring the total output of US factories and mines
        using data from government agencies and trade associations, serving as a
        key economic indicator for the manufacturing and mining sectors.
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on BAAI/bge-small-en-v1.5
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: Finance Embedding Metric
          type: Finance_Embedding_Metric
        metrics:
          - type: cosine_accuracy
            value: 0.9791425260718424
            name: Cosine Accuracy
          - type: dot_accuracy
            value: 0.02085747392815759
            name: Dot Accuracy
          - type: manhattan_accuracy
            value: 0.9779837775202781
            name: Manhattan Accuracy
          - type: euclidean_accuracy
            value: 0.9791425260718424
            name: Euclidean Accuracy
          - type: max_accuracy
            value: 0.9791425260718424
            name: Max Accuracy

SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5 on the baconnier/finance_dataset_small_private dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("baconnier/Finance2_embedding_small_en-V1.5")
# Run inference
sentences = [
    'What is industrial production, and how is it measured by the Federal Reserve Board?',
    'Industrial production is a statistic determined by the Federal Reserve Board that measures the total output of all US factories and mines on a monthly basis. The Fed collects data from various government agencies and trade associations to calculate the industrial production index, which serves as an important economic indicator, providing insight into the health of the manufacturing and mining sectors.\nIndustrial production is a monthly statistic calculated by the Federal Reserve Board, measuring the total output of US factories and mines using data from government agencies and trade associations, serving as a key economic indicator for the manufacturing and mining sectors.',
    'Industrial production is a statistic that measures the output of factories and mines in the US. It is released by the Federal Reserve Board every quarter.\nIndustrial production measures factory and mine output, released quarterly by the Fed.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9791
dot_accuracy 0.0209
manhattan_accuracy 0.978
euclidean_accuracy 0.9791
max_accuracy 0.9791

Training Details

Training Dataset

baconnier/finance_dataset_small_private

  • Dataset: baconnier/finance_dataset_small_private at d7e6492
  • Size: 15,525 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 11 tokens
    • mean: 76.86 tokens
    • max: 304 tokens
    • min: 10 tokens
    • mean: 79.23 tokens
    • max: 299 tokens
    • min: 14 tokens
    • mean: 60.36 tokens
    • max: 155 tokens
  • Samples:
    anchor positive negative
    What is the key difference between a whole loan and a participation loan in terms of investment ownership? The context clearly states that a whole loan is a type of investment where an investor purchases the entire mortgage loan from the original lender, becoming the sole owner. This is in contrast to a participation loan, where multiple investors share ownership of a single loan. Therefore, the key difference between a whole loan and a participation loan is that a whole loan is owned entirely by a single investor, while a participation loan involves shared ownership among multiple investors.
    In a whole loan, a single investor owns the entire mortgage loan, while in a participation loan, multiple investors share ownership of the loan.
    A whole loan is where multiple investors share ownership of a loan, while a participation loan is where an investor purchases the entire loan. Since the context states that a whole loan is where an investor purchases the entire mortgage loan and becomes the sole owner, this answer is incorrect.
    A whole loan involves multiple investors, while a participation loan is owned by a single investor.
    The role of an executor is to manage and distribute the assets of a deceased person's estate in accordance with their will. This includes tasks such as settling debts, filing tax returns, and ensuring that the assets are distributed to the beneficiaries as specified in the will. The executor is appointed by the court to carry out these duties. In the given context, Michael Johnson was nominated by John Smith in his will and appointed by the court as the executor of John's estate, which was valued at $5 million. Michael's responsibilities include dividing the estate equally among John's three children, donating $500,000 to the local animal shelter as per John's instructions, settling the $200,000 mortgage and $50,000 credit card debt, and filing John's final income tax return and paying any outstanding taxes.
    An executor, appointed by the court, manages and distributes a deceased person's assets according to their will, settling debts, filing taxes, and ensuring the will is followed.
    What is the role of an executor in managing a deceased person's estate? An executor is someone who manages a deceased person's estate. They are responsible for distributing the assets according to the will. In this case, John Smith passed away and nominated Michael Johnson as the executor.
    The executor is responsible for distributing the assets of a deceased person's estate according to their will.
    What is a ticker tape, and how does it help investors? A ticker tape is a computerized device that relays stock symbols, latest prices, and trading volumes to investors worldwide in real-time. It helps investors by providing up-to-the-second information about the stocks they are monitoring or interested in, enabling them to make quick and informed trading decisions based on the most current market data available.
    A ticker tape is a real-time digital stock data display that empowers investors to make timely, data-driven trading decisions by providing the latest stock symbols, prices, and volumes.
    A ticker tape is a device that shows stock information. It helps investors by providing some data about stocks.
    A ticker tape provides stock data to investors.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

baconnier/finance_dataset_small_private

  • Dataset: baconnier/finance_dataset_small_private at d7e6492
  • Size: 862 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 10 tokens
    • mean: 78.51 tokens
    • max: 286 tokens
    • min: 12 tokens
    • mean: 76.02 tokens
    • max: 304 tokens
    • min: 20 tokens
    • mean: 59.8 tokens
    • max: 271 tokens
  • Samples:
    anchor positive negative
    What is the underwriter's discount in the given IPO scenario, and how does it relate to the gross spread? The underwriter's discount is the difference between the price the underwriter pays for the shares and the price at which they sell them to the public. In this case, the underwriter buys the shares at a 7% discount from the IPO price of $20 per share. The underwriter's discount is also known as the gross spread, as it represents the gross profit earned by the underwriter.
    The underwriter's discount is 7%, which is equivalent to $1.40 per share. This is also known as the gross spread, representing the underwriter's gross profit.
    The underwriter's discount is the difference between the price the underwriter pays for the shares and the price at which they sell them to the public. In this case, the underwriter buys the shares at a 7% discount, but the gross spread is not mentioned.
    The underwriter's discount is 7%, but the gross spread is unknown.
    What is the primary function of the equity market, and how does it relate to the stock market? The equity market, synonymous with the stock market, serves as a platform for companies to issue ownership shares to raise capital for growth and expansion. Simultaneously, it allows investors to buy these shares, becoming part-owners of the companies and potentially earning returns through stock price appreciation and dividends. The equity market plays a vital role in the financial system by efficiently allocating capital to businesses and providing investment opportunities to individuals and institutions.
    The equity market, or stock market, primarily functions as a mechanism for companies to raise capital by issuing ownership shares, while providing investors with opportunities to invest in these companies and earn returns, thus facilitating efficient capital allocation in the financial system.
    The equity market is where ownership shares of companies are bought and sold. It allows companies to raise money by selling stocks. The stock market is the same as the equity market.
    The equity market and the stock market are the same thing, where stocks are traded.
    A selling syndicate is a group of investment banks that work together to underwrite and distribute a new security issue, such as stocks or bonds, to investors. The syndicate is typically led by one or more lead underwriters, who coordinate the distribution of the securities and set the offering price. In the case of XYZ Corporation, the selling syndicate is led by ABC Investment Bank and consists of 5 investment banks in total. The syndicate has agreed to purchase 10 million new shares from XYZ Corporation at a fixed price of $50 per share, which they will then sell to investors at a higher price of $55 per share. This process allows XYZ Corporation to raise capital by issuing new shares, while the selling syndicate earns a commission on the sale of the shares. The syndicate's role is to facilitate the distribution of the new shares to a wider pool of investors, helping to ensure the success of the offering.
    A selling syndicate is a group of investment banks that jointly underwrite and distribute a new security issue to investors. In XYZ Corporation's case, the syndicate will purchase shares from the company at a fixed price and resell them to investors at a higher price, earning a commission and facilitating the successful distribution of the new shares.
    What is a selling syndicate, and how does it function in the context of XYZ Corporation's new share issue? A selling syndicate is a group of investment banks that work together to sell new shares of a company. In this case, XYZ Corporation has hired 5 investment banks to sell their new shares. The syndicate buys the shares from XYZ Corporation at a fixed price and then sells them to investors at a higher price.
    A selling syndicate is a group of investment banks that jointly underwrite and distribute new shares of a company to investors, buying the shares at a fixed price and selling them at a higher price.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss Finance_Embedding_Metric_max_accuracy
0.0103 10 0.9918 - -
0.0206 20 0.8866 - -
0.0309 30 0.7545 - -
0.0412 40 0.6731 - -
0.0515 50 0.2897 - -
0.0618 60 0.214 - -
0.0721 70 0.1677 - -
0.0824 80 0.0479 - -
0.0927 90 0.191 - -
0.1030 100 0.1188 - -
0.1133 110 0.1909 - -
0.1236 120 0.0486 - -
0.1339 130 0.0812 - -
0.1442 140 0.1282 - -
0.1545 150 0.15 - -
0.1648 160 0.0605 - -
0.1751 170 0.0431 - -
0.1854 180 0.0613 - -
0.1957 190 0.0407 - -
0.2008 195 - 0.0605 -
0.2060 200 0.0567 - -
0.2163 210 0.0294 - -
0.2266 220 0.0284 - -
0.2369 230 0.0444 - -
0.2472 240 0.0559 - -
0.2575 250 0.0301 - -
0.2678 260 0.0225 - -
0.2781 270 0.0256 - -
0.2884 280 0.016 - -
0.2987 290 0.0063 - -
0.3090 300 0.0442 - -
0.3193 310 0.0425 - -
0.3296 320 0.0534 - -
0.3399 330 0.0264 - -
0.3502 340 0.043 - -
0.3605 350 0.035 - -
0.3708 360 0.0212 - -
0.3811 370 0.0171 - -
0.3913 380 0.0497 - -
0.4016 390 0.0294 0.0381 -
0.4119 400 0.0317 - -
0.4222 410 0.0571 - -
0.4325 420 0.0251 - -
0.4428 430 0.0162 - -
0.4531 440 0.0504 - -
0.4634 450 0.0257 - -
0.4737 460 0.0185 - -
0.4840 470 0.0414 - -
0.4943 480 0.016 - -
0.5046 490 0.0432 - -
0.5149 500 0.0369 - -
0.5252 510 0.0115 - -
0.5355 520 0.034 - -
0.5458 530 0.0143 - -
0.5561 540 0.0225 - -
0.5664 550 0.0185 - -
0.5767 560 0.0085 - -
0.5870 570 0.0262 - -
0.5973 580 0.0465 - -
0.6025 585 - 0.0541 -
0.6076 590 0.0121 - -
0.6179 600 0.0256 - -
0.6282 610 0.0203 - -
0.6385 620 0.0301 - -
0.6488 630 0.017 - -
0.6591 640 0.0321 - -
0.6694 650 0.0087 - -
0.6797 660 0.0276 - -
0.6900 670 0.0043 - -
0.7003 680 0.0063 - -
0.7106 690 0.0293 - -
0.7209 700 0.01 - -
0.7312 710 0.0121 - -
0.7415 720 0.0164 - -
0.7518 730 0.0052 - -
0.7621 740 0.0271 - -
0.7724 750 0.0363 - -
0.7827 760 0.0523 - -
0.7930 770 0.0153 - -
0.8033 780 0.015 0.0513 -
0.8136 790 0.0042 - -
0.8239 800 0.0088 - -
0.8342 810 0.0217 - -
0.8445 820 0.0345 - -
0.8548 830 0.01 - -
0.8651 840 0.0243 - -
0.8754 850 0.0074 - -
0.8857 860 0.0082 - -
0.8960 870 0.0104 - -
0.9063 880 0.0078 - -
0.9166 890 0.0163 - -
0.9269 900 0.0168 - -
0.9372 910 0.0088 - -
0.9475 920 0.0186 - -
0.9578 930 0.0055 - -
0.9681 940 0.0142 - -
0.9784 950 0.0251 - -
0.9887 960 0.0468 - -
0.9990 970 0.0031 - -
1.0 971 - - 0.9791

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}