c0ntrolZ
/

mcqa-temporary-final

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

c0ntrolZ commited on May 22

Commit

446dac7

·

verified ·

1 Parent(s): 80f0cc6

End of training

Files changed (2) hide show

README.md +6 -10
generation_config.json +0 -2

README.md CHANGED Viewed

@@ -1,21 +1,20 @@
 ---
 library_name: transformers
 license: apache-2.0
-base_model: unsloth/Qwen3-0.6B-Base
 tags:
-- unsloth
 - generated_from_trainer
 model-index:
-- name: fifth-test
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# fifth-test
-This model is a fine-tuned version of [unsloth/Qwen3-0.6B-Base](https://huggingface.co/unsloth/Qwen3-0.6B-Base) on an unknown dataset.
 ## Model description
@@ -34,15 +33,12 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 2
 - eval_batch_size: 8
 - seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 4
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 5
 - num_epochs: 3
 ### Training results

 ---
 library_name: transformers
 license: apache-2.0
+base_model: Qwen/Qwen3-0.6B-Base
 tags:
 - generated_from_trainer
 model-index:
+- name: qwen3-mcqa-reasoning-2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# qwen3-mcqa-reasoning-2
+This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) on an unknown dataset.
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 2e-06
+- train_batch_size: 1
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
 - num_epochs: 3
 ### Training results

generation_config.json CHANGED Viewed

@@ -1,8 +1,6 @@
 {
   "bos_token_id": 151643,
   "eos_token_id": 151643,
-  "max_length": 32768,
   "max_new_tokens": 2048,
-  "pad_token_id": 151654,
   "transformers_version": "4.52.2"
 }

 {
   "bos_token_id": 151643,
   "eos_token_id": 151643,
   "max_new_tokens": 2048,
   "transformers_version": "4.52.2"
 }