Upload rag SentenceTransformer
Browse files- README.md +94 -57
- config.json +1 -1
- config_sentence_transformers.json +1 -1
README.md
CHANGED
|
@@ -4,36 +4,57 @@ tags:
|
|
| 4 |
- sentence-similarity
|
| 5 |
- feature-extraction
|
| 6 |
- generated_from_trainer
|
| 7 |
-
- dataset_size:
|
| 8 |
- loss:MultipleNegativesRankingLoss
|
| 9 |
base_model: Qwen/Qwen3-0.6B-Base
|
| 10 |
widget:
|
| 11 |
-
- source_sentence:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
sentences:
|
| 13 |
-
-
|
| 14 |
-
-
|
| 15 |
-
-
|
| 16 |
-
- source_sentence:
|
|
|
|
|
|
|
| 17 |
sentences:
|
|
|
|
|
|
|
| 18 |
- '['
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
1100 ?
|
| 23 |
sentences:
|
| 24 |
-
-
|
| 25 |
-
-
|
| 26 |
-
-
|
| 27 |
-
- source_sentence:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
sentences:
|
| 29 |
-
-
|
| 30 |
-
- /
|
| 31 |
- '['
|
| 32 |
-
-
|
|
|
|
| 33 |
sentences:
|
| 34 |
-
- A
|
| 35 |
- S
|
| 36 |
-
-
|
|
|
|
| 37 |
pipeline_tag: sentence-similarity
|
| 38 |
library_name: sentence-transformers
|
| 39 |
---
|
|
@@ -87,9 +108,9 @@ from sentence_transformers import SentenceTransformer
|
|
| 87 |
model = SentenceTransformer("sentence_transformers_model_id")
|
| 88 |
# Run inference
|
| 89 |
sentences = [
|
| 90 |
-
'
|
| 91 |
'S',
|
| 92 |
-
'
|
| 93 |
]
|
| 94 |
embeddings = model.encode(sentences)
|
| 95 |
print(embeddings.shape)
|
|
@@ -143,19 +164,19 @@ You can finetune this model on your own dataset.
|
|
| 143 |
|
| 144 |
#### Unnamed Dataset
|
| 145 |
|
| 146 |
-
* Size:
|
| 147 |
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
|
| 148 |
* Approximate statistics based on the first 1000 samples:
|
| 149 |
-
| | sentence_0
|
| 150 |
-
|
| 151 |
-
| type | string
|
| 152 |
-
| details | <ul><li>min:
|
| 153 |
* Samples:
|
| 154 |
-
| sentence_0
|
| 155 |
-
|
| 156 |
-
| <code>A
|
| 157 |
-
| <code>
|
| 158 |
-
| <code>
|
| 159 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 160 |
```json
|
| 161 |
{
|
|
@@ -167,9 +188,9 @@ You can finetune this model on your own dataset.
|
|
| 167 |
### Training Hyperparameters
|
| 168 |
#### Non-Default Hyperparameters
|
| 169 |
|
| 170 |
-
- `per_device_train_batch_size`:
|
| 171 |
-
- `per_device_eval_batch_size`:
|
| 172 |
-
- `num_train_epochs`:
|
| 173 |
- `fp16`: True
|
| 174 |
- `multi_dataset_batch_sampler`: round_robin
|
| 175 |
|
|
@@ -180,8 +201,8 @@ You can finetune this model on your own dataset.
|
|
| 180 |
- `do_predict`: False
|
| 181 |
- `eval_strategy`: no
|
| 182 |
- `prediction_loss_only`: True
|
| 183 |
-
- `per_device_train_batch_size`:
|
| 184 |
-
- `per_device_eval_batch_size`:
|
| 185 |
- `per_gpu_train_batch_size`: None
|
| 186 |
- `per_gpu_eval_batch_size`: None
|
| 187 |
- `gradient_accumulation_steps`: 1
|
|
@@ -193,7 +214,7 @@ You can finetune this model on your own dataset.
|
|
| 193 |
- `adam_beta2`: 0.999
|
| 194 |
- `adam_epsilon`: 1e-08
|
| 195 |
- `max_grad_norm`: 1
|
| 196 |
-
- `num_train_epochs`:
|
| 197 |
- `max_steps`: -1
|
| 198 |
- `lr_scheduler_type`: linear
|
| 199 |
- `lr_scheduler_kwargs`: {}
|
|
@@ -293,31 +314,47 @@ You can finetune this model on your own dataset.
|
|
| 293 |
</details>
|
| 294 |
|
| 295 |
### Training Logs
|
| 296 |
-
| Epoch | Step
|
| 297 |
-
|
| 298 |
-
| 0.
|
| 299 |
-
| 0.
|
| 300 |
-
| 0.
|
| 301 |
-
| 0.
|
| 302 |
-
| 0.
|
| 303 |
-
| 0.
|
| 304 |
-
| 0.
|
| 305 |
-
| 0.
|
| 306 |
-
|
|
| 307 |
-
|
|
| 308 |
-
|
|
| 309 |
-
|
|
| 310 |
-
|
|
| 311 |
-
|
|
| 312 |
-
|
|
| 313 |
-
|
|
| 314 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 315 |
|
| 316 |
|
| 317 |
### Framework Versions
|
| 318 |
- Python: 3.11.13
|
| 319 |
- Sentence Transformers: 4.1.0
|
| 320 |
-
- Transformers: 4.52.
|
| 321 |
- PyTorch: 2.6.0+cu124
|
| 322 |
- Accelerate: 1.7.0
|
| 323 |
- Datasets: 3.6.0
|
|
|
|
| 4 |
- sentence-similarity
|
| 5 |
- feature-extraction
|
| 6 |
- generated_from_trainer
|
| 7 |
+
- dataset_size:268861
|
| 8 |
- loss:MultipleNegativesRankingLoss
|
| 9 |
base_model: Qwen/Qwen3-0.6B-Base
|
| 10 |
widget:
|
| 11 |
+
- source_sentence: 'There are seven thieves. They stole diamonds from a diamond merchant
|
| 12 |
+
and ran away. While running, night sets in and they decide to rest in the jungle.
|
| 13 |
+
|
| 14 |
+
When everybody was sleeping, two of them woke up and decided to divide the diamonds
|
| 15 |
+
equally among themselves. But when they divided the diamonds equally, one diamond
|
| 16 |
+
is left.
|
| 17 |
+
|
| 18 |
+
So they woke up the 3rd thief and tried to divide the diamonds equally again but
|
| 19 |
+
still one diamond was left. Then they woke up the 4th thief to divide the diamonds
|
| 20 |
+
equally again, and again one diamond was left. This happened with the 5th and
|
| 21 |
+
6th thief – one diamond was still left.
|
| 22 |
+
|
| 23 |
+
Finally, they woke up the 7th thief and this time the diamonds were divided equally.
|
| 24 |
+
|
| 25 |
+
How many diamonds did they steal in total?'
|
| 26 |
sentences:
|
| 27 |
+
- ''''
|
| 28 |
+
- ''''
|
| 29 |
+
- e
|
| 30 |
+
- source_sentence: 'praveen starts business with rs . 3220 and after 5 months , hari
|
| 31 |
+
joins with praveen as his partner . after a year , the profit is divided in the
|
| 32 |
+
ratio 2 : 3 . what is hari ’ s contribution in the capital ?'
|
| 33 |
sentences:
|
| 34 |
+
- s
|
| 35 |
+
- '5'
|
| 36 |
- '['
|
| 37 |
+
- source_sentence: 'Which of the following is material of choice in class V
|
| 38 |
+
|
| 39 |
+
cavity with abfraction?'
|
|
|
|
| 40 |
sentences:
|
| 41 |
+
- '['
|
| 42 |
+
- t
|
| 43 |
+
- G
|
| 44 |
+
- source_sentence: A right circular cylinder has a height of 25 and a radius of 5.
|
| 45 |
+
A rectangular solid with a height of 15 and a square base, is placed in the cylinder
|
| 46 |
+
such that each of the corners of the solid is tangent to the cylinder wall. Liquid
|
| 47 |
+
is then poured into the cylinder such that it reaches the rim. What is the volume
|
| 48 |
+
of the liquid?
|
| 49 |
sentences:
|
| 50 |
+
- '5'
|
|
|
|
| 51 |
- '['
|
| 52 |
+
- '2'
|
| 53 |
+
- source_sentence: Cerebral angiography was performed by -
|
| 54 |
sentences:
|
|
|
|
| 55 |
- S
|
| 56 |
+
- t
|
| 57 |
+
- '2'
|
| 58 |
pipeline_tag: sentence-similarity
|
| 59 |
library_name: sentence-transformers
|
| 60 |
---
|
|
|
|
| 108 |
model = SentenceTransformer("sentence_transformers_model_id")
|
| 109 |
# Run inference
|
| 110 |
sentences = [
|
| 111 |
+
'Cerebral angiography was performed by -',
|
| 112 |
'S',
|
| 113 |
+
'2',
|
| 114 |
]
|
| 115 |
embeddings = model.encode(sentences)
|
| 116 |
print(embeddings.shape)
|
|
|
|
| 164 |
|
| 165 |
#### Unnamed Dataset
|
| 166 |
|
| 167 |
+
* Size: 268,861 training samples
|
| 168 |
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
|
| 169 |
* Approximate statistics based on the first 1000 samples:
|
| 170 |
+
| | sentence_0 | sentence_1 |
|
| 171 |
+
|:--------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|
|
| 172 |
+
| type | string | string |
|
| 173 |
+
| details | <ul><li>min: 5 tokens</li><li>mean: 48.3 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 0 tokens</li><li>mean: 0.97 tokens</li><li>max: 1 tokens</li></ul> |
|
| 174 |
* Samples:
|
| 175 |
+
| sentence_0 | sentence_1 |
|
| 176 |
+
|:--------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
|
| 177 |
+
| <code>A 1200 m long train crosses a tree in 120 sec, how much time will I take to pass a platform 1100 m long?</code> | <code>'</code> |
|
| 178 |
+
| <code>What is the opposite of rarefaction zones, where air molecules in waves are loosely packed?</code> | <code>[</code> |
|
| 179 |
+
| <code>if w is 40 percent less than e , e is 40 percent less than y , and z is 46 percent less than y , then z is greater than w by what percent of w ?</code> | <code>%</code> |
|
| 180 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
| 181 |
```json
|
| 182 |
{
|
|
|
|
| 188 |
### Training Hyperparameters
|
| 189 |
#### Non-Default Hyperparameters
|
| 190 |
|
| 191 |
+
- `per_device_train_batch_size`: 64
|
| 192 |
+
- `per_device_eval_batch_size`: 64
|
| 193 |
+
- `num_train_epochs`: 4
|
| 194 |
- `fp16`: True
|
| 195 |
- `multi_dataset_batch_sampler`: round_robin
|
| 196 |
|
|
|
|
| 201 |
- `do_predict`: False
|
| 202 |
- `eval_strategy`: no
|
| 203 |
- `prediction_loss_only`: True
|
| 204 |
+
- `per_device_train_batch_size`: 64
|
| 205 |
+
- `per_device_eval_batch_size`: 64
|
| 206 |
- `per_gpu_train_batch_size`: None
|
| 207 |
- `per_gpu_eval_batch_size`: None
|
| 208 |
- `gradient_accumulation_steps`: 1
|
|
|
|
| 214 |
- `adam_beta2`: 0.999
|
| 215 |
- `adam_epsilon`: 1e-08
|
| 216 |
- `max_grad_norm`: 1
|
| 217 |
+
- `num_train_epochs`: 4
|
| 218 |
- `max_steps`: -1
|
| 219 |
- `lr_scheduler_type`: linear
|
| 220 |
- `lr_scheduler_kwargs`: {}
|
|
|
|
| 314 |
</details>
|
| 315 |
|
| 316 |
### Training Logs
|
| 317 |
+
| Epoch | Step | Training Loss |
|
| 318 |
+
|:------:|:-----:|:-------------:|
|
| 319 |
+
| 0.1190 | 500 | 4.0939 |
|
| 320 |
+
| 0.2380 | 1000 | 3.7716 |
|
| 321 |
+
| 0.3571 | 1500 | 0.0 |
|
| 322 |
+
| 0.4761 | 2000 | 0.0 |
|
| 323 |
+
| 0.5951 | 2500 | 0.0 |
|
| 324 |
+
| 0.7141 | 3000 | 0.0 |
|
| 325 |
+
| 0.8331 | 3500 | 0.0 |
|
| 326 |
+
| 0.9522 | 4000 | 0.0 |
|
| 327 |
+
| 1.0712 | 4500 | 0.0 |
|
| 328 |
+
| 1.1902 | 5000 | 0.0 |
|
| 329 |
+
| 1.3092 | 5500 | 0.0 |
|
| 330 |
+
| 1.4282 | 6000 | 0.0 |
|
| 331 |
+
| 1.5473 | 6500 | 0.0 |
|
| 332 |
+
| 1.6663 | 7000 | 0.0 |
|
| 333 |
+
| 1.7853 | 7500 | 0.0 |
|
| 334 |
+
| 1.9043 | 8000 | 0.0 |
|
| 335 |
+
| 2.0233 | 8500 | 0.0 |
|
| 336 |
+
| 2.1423 | 9000 | 0.0 |
|
| 337 |
+
| 2.2614 | 9500 | 0.0 |
|
| 338 |
+
| 2.3804 | 10000 | 0.0 |
|
| 339 |
+
| 2.4994 | 10500 | 0.0 |
|
| 340 |
+
| 2.6184 | 11000 | 0.0 |
|
| 341 |
+
| 2.7374 | 11500 | 0.0 |
|
| 342 |
+
| 2.8565 | 12000 | 0.0 |
|
| 343 |
+
| 2.9755 | 12500 | 0.0 |
|
| 344 |
+
| 3.0945 | 13000 | 0.0 |
|
| 345 |
+
| 3.2135 | 13500 | 0.0 |
|
| 346 |
+
| 3.3325 | 14000 | 0.0 |
|
| 347 |
+
| 3.4516 | 14500 | 0.0 |
|
| 348 |
+
| 3.5706 | 15000 | 0.0 |
|
| 349 |
+
| 3.6896 | 15500 | 0.0 |
|
| 350 |
+
| 3.8086 | 16000 | 0.0 |
|
| 351 |
+
| 3.9276 | 16500 | 0.0 |
|
| 352 |
|
| 353 |
|
| 354 |
### Framework Versions
|
| 355 |
- Python: 3.11.13
|
| 356 |
- Sentence Transformers: 4.1.0
|
| 357 |
+
- Transformers: 4.52.4
|
| 358 |
- PyTorch: 2.6.0+cu124
|
| 359 |
- Accelerate: 1.7.0
|
| 360 |
- Datasets: 3.6.0
|
config.json
CHANGED
|
@@ -23,7 +23,7 @@
|
|
| 23 |
"sliding_window": null,
|
| 24 |
"tie_word_embeddings": true,
|
| 25 |
"torch_dtype": "float32",
|
| 26 |
-
"transformers_version": "4.52.
|
| 27 |
"use_cache": true,
|
| 28 |
"use_sliding_window": false,
|
| 29 |
"vocab_size": 151936
|
|
|
|
| 23 |
"sliding_window": null,
|
| 24 |
"tie_word_embeddings": true,
|
| 25 |
"torch_dtype": "float32",
|
| 26 |
+
"transformers_version": "4.52.4",
|
| 27 |
"use_cache": true,
|
| 28 |
"use_sliding_window": false,
|
| 29 |
"vocab_size": 151936
|
config_sentence_transformers.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
{
|
| 2 |
"__version__": {
|
| 3 |
"sentence_transformers": "4.1.0",
|
| 4 |
-
"transformers": "4.52.
|
| 5 |
"pytorch": "2.6.0+cu124"
|
| 6 |
},
|
| 7 |
"prompts": {},
|
|
|
|
| 1 |
{
|
| 2 |
"__version__": {
|
| 3 |
"sentence_transformers": "4.1.0",
|
| 4 |
+
"transformers": "4.52.4",
|
| 5 |
"pytorch": "2.6.0+cu124"
|
| 6 |
},
|
| 7 |
"prompts": {},
|