Add new SentenceTransformer model

58da7ec verified 15 days ago

30.4 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:73
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	base_model: thenlper/gte-base
	widget:
	- source_sentence: What is the maximum value of equipment that can be purchased with
	a CUE Student Research Project Grant?
	sentences:
	- Equipment costs (valued up to $1000).
	- Variable awards to recognize and reward academic achievement at the senior high
	school level and to encourage students to pursue post -secondary studies.
	- The Amazon Future Engineer Scholarship provides students with an opportunity to
	upgrade their careers with a $7,500 CAD/year scholarship available for up to four
	years.
	- source_sentence: What is the minimum distance a recipient's hometown must be from
	Concordia University of Edmonton to be eligible for the Alberta Blue Cross Away
	from Home Scholarship?
	sentences:
	- Three awards are available
	- The recipient’s hometown must be at least 100 kilometres from Concordia University
	of Edmonton.
	- 'Application Deadline: September 1'
	- source_sentence: According to the selection criteria, what level of subjects are
	used to determine the academic standing of a potential Alberta Blue Cross Away
	from Home Scholarship recipient?
	sentences:
	- Selection is ba sed on the academic standing of 30 -level subjects used for admission.
	- 'These eligible and ineligible lists are not exhaustive. Doubts about the eligibility
	of expenses should be directed to the ORI’s Research Administration Service s
	(RAS): [email protected] .'
	- '*Value: $11000 Master’s; $14,000 Doctoral'
	- source_sentence: According to the text, how many days does a grant recipient have
	to submit a final report after the grant ends?
	sentences:
	- All Fall grant recipients are expected to submit an abstract to present an oral
	and/or poster presentation of their work, either in its progression or final stage.
	- a business program offered by an Alberta college, polytechnic, or university that
	offers the prerequisite courses required for entrance into the CPA Professional
	Education Program (CPA PEP).
	- The applicant is required to complete and submit a final report within 5 days
	of the end of the grant.
	- source_sentence: In what format should applicants acknowledge the funding provided
	by Concordia University of Edmonton for their Student Project Grant?
	sentences:
	- All oral or poster presentations, publications, including public messages, arising
	from research supported by CUE grants must acknowledge the support of the institution.
	Acknowledgement can be in the written format, such as " This research is funded
	by the generous support of Concordia University of Edmonton through their CUE
	Student Research Project Grants program ", or similar phrasing.
	- This $1,000 scholarship is awarded to post -secondary students who have completed
	at least one year towards their Bachelor of Science with a focus on Computer Science,
	achieved an average GPA of 3.5 or higher, and are still enrolled in post -secondary
	studie s.
	- The recipient will be selected based on the highest grade in MARK320. In the event
	of a tie, preference will be given to the student with the highest cumulative
	GPA.
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	model-index:
	- name: BGE base Financial Matryoshka
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 768
	type: dim_768
	metrics:
	- type: cosine_accuracy@1
	value: 0.5555555555555556
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 1.0
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 1.0
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 1.0
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.5555555555555556
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.3333333333333333
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.2
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.1
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.5555555555555556
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 1.0
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 1.0
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 1.0
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.8214210289682637
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.7592592592592592
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.7592592592592592
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 512
	type: dim_512
	metrics:
	- type: cosine_accuracy@1
	value: 0.4444444444444444
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.8888888888888888
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 1.0
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 1.0
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.4444444444444444
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.2962962962962963
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.2
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.1
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.4444444444444444
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.8888888888888888
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 1.0
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 1.0
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7678413135022636
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.6888888888888889
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.6888888888888889
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 256
	type: dim_256
	metrics:
	- type: cosine_accuracy@1
	value: 0.4444444444444444
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 1.0
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 1.0
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 1.0
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.4444444444444444
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.3333333333333333
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.2
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.1
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.4444444444444444
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 1.0
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 1.0
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 1.0
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7658654734127082
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.6851851851851851
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.6851851851851851
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 128
	type: dim_128
	metrics:
	- type: cosine_accuracy@1
	value: 0.4444444444444444
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.8888888888888888
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.8888888888888888
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.8888888888888888
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.4444444444444444
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.2962962962962963
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.17777777777777778
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.08888888888888889
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.4444444444444444
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.8888888888888888
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.8888888888888888
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.8888888888888888
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7103099178571526
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.6481481481481483
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.6521164021164021
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 64
	type: dim_64
	metrics:
	- type: cosine_accuracy@1
	value: 0.6666666666666666
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.6666666666666666
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.7777777777777778
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.8888888888888888
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.6666666666666666
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.2222222222222222
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.15555555555555556
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.08888888888888889
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.6666666666666666
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.6666666666666666
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.7777777777777778
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.8888888888888888
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7515566546007473
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.7103174603174602
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.71494708994709
	name: Cosine Map@100
	---

	# BGE base Financial Matryoshka

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [thenlper/gte-base](https://huggingface.co/thenlper/gte-base) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [thenlper/gte-base](https://huggingface.co/thenlper/gte-base) <!-- at revision c078288308d8dee004ab72c6191778064285ec0c -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 768 dimensions
	- Similarity Function: Cosine Similarity
	- Training Dataset:
	- json
	- Language: en
	- License: apache-2.0

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("TatvaRA/gte-base-finetuned-schorlaships-matryonshka")
	# Run inference
	sentences = [
	'In what format should applicants acknowledge the funding provided by Concordia University of Edmonton for their Student Project Grant?',
	'All oral or poster presentations, publications, including public messages, arising from research supported by CUE grants must acknowledge the support of the institution. Acknowledgement can be in the written format, such as " This research is funded by the generous support of Concordia University of Edmonton through their CUE Student Research Project Grants program ", or similar phrasing.',
	'The recipient will be selected based on the highest grade in MARK320. In the event of a tie, preference will be given to the student with the highest cumulative GPA.',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval

	* Dataset: `dim_768`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 768
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.5556 \|
	\| cosine_accuracy@3 \| 1.0 \|
	\| cosine_accuracy@5 \| 1.0 \|
	\| cosine_accuracy@10 \| 1.0 \|
	\| cosine_precision@1 \| 0.5556 \|
	\| cosine_precision@3 \| 0.3333 \|
	\| cosine_precision@5 \| 0.2 \|
	\| cosine_precision@10 \| 0.1 \|
	\| cosine_recall@1 \| 0.5556 \|
	\| cosine_recall@3 \| 1.0 \|
	\| cosine_recall@5 \| 1.0 \|
	\| cosine_recall@10 \| 1.0 \|
	\| cosine_ndcg@10 \| 0.8214 \|
	\| cosine_mrr@10 \| 0.7593 \|
	\| cosine_map@100 \| 0.7593 \|

	#### Information Retrieval

	* Dataset: `dim_512`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 512
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.4444 \|
	\| cosine_accuracy@3 \| 0.8889 \|
	\| cosine_accuracy@5 \| 1.0 \|
	\| cosine_accuracy@10 \| 1.0 \|
	\| cosine_precision@1 \| 0.4444 \|
	\| cosine_precision@3 \| 0.2963 \|
	\| cosine_precision@5 \| 0.2 \|
	\| cosine_precision@10 \| 0.1 \|
	\| cosine_recall@1 \| 0.4444 \|
	\| cosine_recall@3 \| 0.8889 \|
	\| cosine_recall@5 \| 1.0 \|
	\| cosine_recall@10 \| 1.0 \|
	\| cosine_ndcg@10 \| 0.7678 \|
	\| cosine_mrr@10 \| 0.6889 \|
	\| cosine_map@100 \| 0.6889 \|

	#### Information Retrieval

	* Dataset: `dim_256`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 256
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.4444 \|
	\| cosine_accuracy@3 \| 1.0 \|
	\| cosine_accuracy@5 \| 1.0 \|
	\| cosine_accuracy@10 \| 1.0 \|
	\| cosine_precision@1 \| 0.4444 \|
	\| cosine_precision@3 \| 0.3333 \|
	\| cosine_precision@5 \| 0.2 \|
	\| cosine_precision@10 \| 0.1 \|
	\| cosine_recall@1 \| 0.4444 \|
	\| cosine_recall@3 \| 1.0 \|
	\| cosine_recall@5 \| 1.0 \|
	\| cosine_recall@10 \| 1.0 \|
	\| cosine_ndcg@10 \| 0.7659 \|
	\| cosine_mrr@10 \| 0.6852 \|
	\| cosine_map@100 \| 0.6852 \|

	#### Information Retrieval

	* Dataset: `dim_128`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 128
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.4444 \|
	\| cosine_accuracy@3 \| 0.8889 \|
	\| cosine_accuracy@5 \| 0.8889 \|
	\| cosine_accuracy@10 \| 0.8889 \|
	\| cosine_precision@1 \| 0.4444 \|
	\| cosine_precision@3 \| 0.2963 \|
	\| cosine_precision@5 \| 0.1778 \|
	\| cosine_precision@10 \| 0.0889 \|
	\| cosine_recall@1 \| 0.4444 \|
	\| cosine_recall@3 \| 0.8889 \|
	\| cosine_recall@5 \| 0.8889 \|
	\| cosine_recall@10 \| 0.8889 \|
	\| cosine_ndcg@10 \| 0.7103 \|
	\| cosine_mrr@10 \| 0.6481 \|
	\| cosine_map@100 \| 0.6521 \|

	#### Information Retrieval

	* Dataset: `dim_64`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters:
	```json
	{
	"truncate_dim": 64
	}
	```

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.6667 \|
	\| cosine_accuracy@3 \| 0.6667 \|
	\| cosine_accuracy@5 \| 0.7778 \|
	\| cosine_accuracy@10 \| 0.8889 \|
	\| cosine_precision@1 \| 0.6667 \|
	\| cosine_precision@3 \| 0.2222 \|
	\| cosine_precision@5 \| 0.1556 \|
	\| cosine_precision@10 \| 0.0889 \|
	\| cosine_recall@1 \| 0.6667 \|
	\| cosine_recall@3 \| 0.6667 \|
	\| cosine_recall@5 \| 0.7778 \|
	\| cosine_recall@10 \| 0.8889 \|
	\| cosine_ndcg@10 \| 0.7516 \|
	\| cosine_mrr@10 \| 0.7103 \|
	\| cosine_map@100 \| 0.7149 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### json

	* Dataset: json
	* Size: 73 training samples
	* Columns: <code>anchor</code> and <code>positive</code>
	* Approximate statistics based on the first 73 samples:
	\| \| anchor \| positive \|
	\|:--------\|:----------------------------------------------------------------------------------\|:-----------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 14 tokens</li><li>mean: 23.0 tokens</li><li>max: 41 tokens</li></ul> \| <ul><li>min: 6 tokens</li><li>mean: 32.74 tokens</li><li>max: 346 tokens</li></ul> \|
	* Samples:
	\| anchor \| positive \|
	\|:----------------------------------------------------------------------------------------------------------------------------------------------------\|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>What specific type of students are the Alberta Innovates Graduate Student Scholarships designed to support?</code> \| <code>The Alberta Innovates Graduate Student Scholarships support academically superior graduate students <br>who are receiving training and conducting research in areas that are strategically important to Alberta’s <br>economy.</code> \|
	\| <code>What is the specific date by which students must submit their reports for the Spring 2025 grant period?</code> \| <code>Report due date April 20th (5 days post grant closure)</code> \|
	\| <code>In what format should applicants acknowledge the funding provided by Concordia University of Edmonton for their Student Project Grant?</code> \| <code>All oral or poster presentations, publications, including public messages, arising from research supported by CUE grants must acknowledge the support of the institution. Acknowledgement can be in the written format, such as " This research is funded by the generous support of Concordia University of Edmonton through their CUE Student Research Project Grants program ", or similar phrasing.</code> \|
	* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
	```json
	{
	"loss": "MultipleNegativesRankingLoss",
	"matryoshka_dims": [
	768,
	512,
	256,
	128,
	64
	],
	"matryoshka_weights": [
	1,
	1,
	1,
	1,
	1
	],
	"n_dims_per_step": -1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: epoch
	- `per_device_train_batch_size`: 32
	- `per_device_eval_batch_size`: 16
	- `gradient_accumulation_steps`: 16
	- `learning_rate`: 2e-05
	- `num_train_epochs`: 4
	- `lr_scheduler_type`: cosine
	- `warmup_ratio`: 0.1
	- `fp16`: True
	- `load_best_model_at_end`: True
	- `optim`: adamw_torch_fused
	- `batch_sampler`: no_duplicates

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: epoch
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 32
	- `per_device_eval_batch_size`: 16
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 16
	- `eval_accumulation_steps`: None
	- `learning_rate`: 2e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1.0
	- `num_train_epochs`: 4
	- `max_steps`: -1
	- `lr_scheduler_type`: cosine
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.1
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: False
	- `fp16`: True
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: True
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch_fused
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: False
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `dispatch_batches`: None
	- `split_batches`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `prompts`: None
	- `batch_sampler`: no_duplicates
	- `multi_dataset_batch_sampler`: proportional

	</details>

	### Training Logs
	\| Epoch \| Step \| dim_768_cosine_ndcg@10 \| dim_512_cosine_ndcg@10 \| dim_256_cosine_ndcg@10 \| dim_128_cosine_ndcg@10 \| dim_64_cosine_ndcg@10 \|
	\|:-------:\|:-----:\|:----------------------:\|:----------------------:\|:----------------------:\|:----------------------:\|:---------------------:\|
	\| 1.0 \| 1 \| 0.7249 \| 0.7249 \| 0.7473 \| 0.7026 \| 0.6686 \|
	\| 2.0 \| 2 \| 0.7619 \| 0.7249 \| 0.7533 \| 0.7026 \| 0.7480 \|
	\| 3.0 \| 3 \| 0.7804 \| 0.7619 \| 0.7659 \| 0.7103 \| 0.7496 \|
	\| 4.0 \| 4 \| 0.8214 \| 0.7678 \| 0.7659 \| 0.7103 \| 0.7516 \|

	* The bold row denotes the saved checkpoint.

	### Framework Versions
	- Python: 3.11.12
	- Sentence Transformers: 4.1.0
	- Transformers: 4.41.2
	- PyTorch: 2.1.2+cu121
	- Accelerate: 1.5.2
	- Datasets: 2.19.1
	- Tokenizers: 0.19.1

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MatryoshkaLoss
	```bibtex
	@misc{kusupati2024matryoshka,
	title={Matryoshka Representation Learning},
	author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
	year={2024},
	eprint={2205.13147},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->