Matjac5
/

MNLP_M3_RAG_MODEL_data_mixture_maths

@@ -4,36 +4,57 @@ tags:
 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
-- dataset_size:286834
 - loss:MultipleNegativesRankingLoss
 base_model: Qwen/Qwen3-0.6B-Base
 widget:
-- source_sentence: What causes orbits?
   sentences:
-  - p
-  - I
-  - C
-- source_sentence: Select the liquid.
   sentences:
   - '['
-  - '['
-  - '['
-- source_sentence: how many times digit 6 is used while writing numbers from 100 to
-    1100 ?
   sentences:
-  - A
-  - ''''
-  - ''''
-- source_sentence: 'True about quinsy is:'
   sentences:
-  - P
-  - /
   - '['
-- source_sentence: Which is not the indication of CT in head trauma
   sentences:
-  - A
   - S
-  - '8'
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
@@ -87,9 +108,9 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("sentence_transformers_model_id")
 # Run inference
 sentences = [
-    'Which is not the indication of CT in head trauma',
     'S',
-    'A',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
@@ -143,19 +164,19 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 286,834 training samples
 * Columns: <code>sentence_0</code> and <code>sentence_1</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence_0                                                                         | sentence_1                                                                      |
-  |:--------|:-----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|
-  | type    | string                                                                             | string                                                                          |
-  | details | <ul><li>min: 3 tokens</li><li>mean: 32.29 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 0 tokens</li><li>mean: 0.98 tokens</li><li>max: 1 tokens</li></ul> |
 * Samples:
-  | sentence_0                                                                                                                                                                                    | sentence_1     |
-  |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
-  | <code>A, B and C rents a pasture for Rs.435. A put in 12 horses for 8 months, B 16 horses for 9 months and 18 horses for 6 months. How much should C pay?</code>                              | <code>[</code> |
-  | <code>mr . kutty has only hens and sheep . if the total number of their heads is 38 and the total number of legs is 100 then what is the ratio between the numbers of hens and sheep ?</code> | <code> </code> |
-  | <code>A fruit seller had some Mangoes. He sells 50% oranges and still has 500 Mangoes. How many Mangoes he had originally?</code>                                                             | <code>)</code> |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
@@ -167,9 +188,9 @@ You can finetune this model on your own dataset.
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
-- `per_device_train_batch_size`: 32
-- `per_device_eval_batch_size`: 32
-- `num_train_epochs`: 1
 - `fp16`: True
 - `multi_dataset_batch_sampler`: round_robin
@@ -180,8 +201,8 @@ You can finetune this model on your own dataset.
 - `do_predict`: False
 - `eval_strategy`: no
 - `prediction_loss_only`: True
-- `per_device_train_batch_size`: 32
-- `per_device_eval_batch_size`: 32
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
@@ -193,7 +214,7 @@ You can finetune this model on your own dataset.
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1
-- `num_train_epochs`: 1
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
@@ -293,31 +314,47 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch  | Step | Training Loss |
-|:------:|:----:|:-------------:|
-| 0.0558 | 500  | 3.4895        |
-| 0.1116 | 1000 | 3.1888        |
-| 0.1673 | 1500 | 3.2117        |
-| 0.2231 | 2000 | 0.0           |
-| 0.2789 | 2500 | 0.0           |
-| 0.3347 | 3000 | 0.0           |
-| 0.3905 | 3500 | 0.0           |
-| 0.4462 | 4000 | 0.0           |
-| 0.5020 | 4500 | 0.0           |
-| 0.5578 | 5000 | 0.0           |
-| 0.6136 | 5500 | 0.0           |
-| 0.6693 | 6000 | 0.0           |
-| 0.7251 | 6500 | 0.0           |
-| 0.7809 | 7000 | 0.0           |
-| 0.8367 | 7500 | 0.0           |
-| 0.8925 | 8000 | 0.0           |
-| 0.9482 | 8500 | 0.0           |
 ### Framework Versions
 - Python: 3.11.13
 - Sentence Transformers: 4.1.0
-- Transformers: 4.52.3
 - PyTorch: 2.6.0+cu124
 - Accelerate: 1.7.0
 - Datasets: 3.6.0

 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
+- dataset_size:268861
 - loss:MultipleNegativesRankingLoss
 base_model: Qwen/Qwen3-0.6B-Base
 widget:
+- source_sentence: 'There are seven thieves. They stole diamonds from a diamond merchant
+    and ran away. While running, night sets in and they decide to rest in the jungle.
+    When everybody was sleeping, two of them woke up and decided to divide the diamonds
+    equally among themselves. But when they divided the diamonds equally, one diamond
+    is left.
+    So they woke up the 3rd thief and tried to divide the diamonds equally again but
+    still one diamond was left. Then they woke up the 4th thief to divide the diamonds
+    equally again, and again one diamond was left. This happened with the 5th and
+    6th thief – one diamond was still left.
+    Finally, they woke up the 7th thief and this time the diamonds were divided equally.
+    How many diamonds did they steal in total?'
   sentences:
+  - ''''
+  - ''''
+  - e
+- source_sentence: 'praveen starts business with rs . 3220 and after 5 months , hari
+    joins with praveen as his partner . after a year , the profit is divided in the
+    ratio 2 : 3 . what is hari ’ s contribution in the capital ?'
   sentences:
+  - s
+  - '5'
   - '['
+- source_sentence: 'Which of the following is material of choice in class V
+    cavity with abfraction?'
   sentences:
+  - '['
+  - t
+  - G
+- source_sentence: A right circular cylinder has a height of 25 and a radius of 5.
+    A rectangular solid with a height of 15 and a square base, is placed in the cylinder
+    such that each of the corners of the solid is tangent to the cylinder wall. Liquid
+    is then poured into the cylinder such that it reaches the rim. What is the volume
+    of the liquid?
   sentences:
+  - '5'
   - '['
+  - '2'
+- source_sentence: Cerebral angiography was performed by -
   sentences:
   - S
+  - t
+  - '2'
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
 model = SentenceTransformer("sentence_transformers_model_id")
 # Run inference
 sentences = [
+    'Cerebral angiography was performed by -',
     'S',
+    '2',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 #### Unnamed Dataset
+* Size: 268,861 training samples
 * Columns: <code>sentence_0</code> and <code>sentence_1</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                        | sentence_1                                                                      |
+  |:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|
+  | type    | string                                                                            | string                                                                          |
+  | details | <ul><li>min: 5 tokens</li><li>mean: 48.3 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 0 tokens</li><li>mean: 0.97 tokens</li><li>max: 1 tokens</li></ul> |
 * Samples:
+  | sentence_0                                                                                                                                                    | sentence_1     |
+  |:--------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
+  | <code>A 1200 m long train crosses a tree in 120 sec, how much time will I take to pass a platform 1100 m long?</code>                                         | <code>'</code> |
+  | <code>What is the opposite of rarefaction zones, where air molecules in waves are loosely packed?</code>                                                      | <code>[</code> |
+  | <code>if w is 40 percent less than e , e is 40 percent less than y , and z is 46 percent less than y , then z is greater than w by what percent of w ?</code> | <code>%</code> |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
+- `num_train_epochs`: 4
 - `fp16`: True
 - `multi_dataset_batch_sampler`: round_robin
 - `do_predict`: False
 - `eval_strategy`: no
 - `prediction_loss_only`: True
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1
+- `num_train_epochs`: 4
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
 </details>
 ### Training Logs
+| Epoch  | Step  | Training Loss |
+|:------:|:-----:|:-------------:|
+| 0.1190 | 500   | 4.0939        |
+| 0.2380 | 1000  | 3.7716        |
+| 0.3571 | 1500  | 0.0           |
+| 0.4761 | 2000  | 0.0           |
+| 0.5951 | 2500  | 0.0           |
+| 0.7141 | 3000  | 0.0           |
+| 0.8331 | 3500  | 0.0           |
+| 0.9522 | 4000  | 0.0           |
+| 1.0712 | 4500  | 0.0           |
+| 1.1902 | 5000  | 0.0           |
+| 1.3092 | 5500  | 0.0           |
+| 1.4282 | 6000  | 0.0           |
+| 1.5473 | 6500  | 0.0           |
+| 1.6663 | 7000  | 0.0           |
+| 1.7853 | 7500  | 0.0           |
+| 1.9043 | 8000  | 0.0           |
+| 2.0233 | 8500  | 0.0           |
+| 2.1423 | 9000  | 0.0           |
+| 2.2614 | 9500  | 0.0           |
+| 2.3804 | 10000 | 0.0           |
+| 2.4994 | 10500 | 0.0           |
+| 2.6184 | 11000 | 0.0           |
+| 2.7374 | 11500 | 0.0           |
+| 2.8565 | 12000 | 0.0           |
+| 2.9755 | 12500 | 0.0           |
+| 3.0945 | 13000 | 0.0           |
+| 3.2135 | 13500 | 0.0           |
+| 3.3325 | 14000 | 0.0           |
+| 3.4516 | 14500 | 0.0           |
+| 3.5706 | 15000 | 0.0           |
+| 3.6896 | 15500 | 0.0           |
+| 3.8086 | 16000 | 0.0           |
+| 3.9276 | 16500 | 0.0           |
 ### Framework Versions
 - Python: 3.11.13
 - Sentence Transformers: 4.1.0
+- Transformers: 4.52.4
 - PyTorch: 2.6.0+cu124
 - Accelerate: 1.7.0
 - Datasets: 3.6.0

config.json CHANGED Viewed

@@ -23,7 +23,7 @@
   "sliding_window": null,
   "tie_word_embeddings": true,
   "torch_dtype": "float32",
-  "transformers_version": "4.52.3",
   "use_cache": true,
   "use_sliding_window": false,
   "vocab_size": 151936

   "sliding_window": null,
   "tie_word_embeddings": true,
   "torch_dtype": "float32",
+  "transformers_version": "4.52.4",
   "use_cache": true,
   "use_sliding_window": false,
   "vocab_size": 151936

config_sentence_transformers.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "__version__": {
     "sentence_transformers": "4.1.0",
-    "transformers": "4.52.3",
     "pytorch": "2.6.0+cu124"
   },
   "prompts": {},

 {
   "__version__": {
     "sentence_transformers": "4.1.0",
+    "transformers": "4.52.4",
     "pytorch": "2.6.0+cu124"
   },
   "prompts": {},