--- license: cc-by-sa-4.0 base_model: internlm/internlm2_5-7b tags: - llama-factory - full - generated_from_trainer model-index: - name: CollectiveSFT results: [] language: - zh - en --- # CollectiveSFT This model is a fine-tuned version of [internlm/internlm2_5-7b](https://huggingface.co/internlm/internlm2_5-7b) on some medical datasets. ## Model description [CollectiveSFT](https://arxiv.org/abs/2407.19705): Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare. Official Code Repo:[https://github.com/CAS-SIAT-XinHai/CollectiveSFT](https://github.com/CAS-SIAT-XinHai/CollectiveSFT) ## Intended uses & limitations The model may have limitations in chat functionality. ## Training and evaluation data **Language: English** | Dataset Name | Style | Size | |:--------------:|:-------:|:----------:| | PubMedQA | QA | 273,518 | | MedMCQA| MCQA | 182,822 | | HeadQA | QA | 2,657 | | **Total** | | 458,997 | **Language: Chinese** | Dataset Name | Style | Size | |:------------------:|:----------:|:----------:| | cMedQA2 | QA | 100,000 | | cMedDialogu | Dialogue | 792,099 | | webMedQA | QA | 252,850 | | MedicalDialog| Dialogue| 2,725,989| | CMID | NER | 12,254 | | NLPEC | MCQA | 18,703 | | CMB | MCQA | 269,359 | | MLEC-QA | MCQA | 108,988 | | DISCMe | Dialogue | 464,898 | | **Total** | | 4,745,140| For detailed dataset specifications and access instructions, please refer to our paper. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 2 - total_train_batch_size: 256 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 3.0 ### Framework versions - Transformers 4.42.4 - Pytorch 2.3.1+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1