deepseek-llm-7b-telecom-finetuned
This model is a fine-tuned version of deepseek-ai/deepseek-llm-7b-base on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: nan
Model Description
This model is a fine-tuned version of the DeepSeek-LLM 7B base model, specifically optimized for telecom domain tasks. It has been trained on a comprehensive dataset of telecom-specific content covering technical support, network infrastructure, telecommunications regulations, customer service, and product information. The model uses LoRA (Low-Rank Adaptation) fine-tuning to efficiently adapt the base model to the telecom domain while maintaining the general capabilities of the original model.
Intended Uses & Limitations
Intended Uses
• Providing technical support for telecom products and services • Answering questions about network infrastructure and protocols • Explaining telecommunications regulations and compliance requirements • Assisting with customer service inquiries in the telecom sector • Generating documentation and explanations for telecom products • Supporting telecom professionals with domain-specific knowledge
Limitations
• The model is specialized for the telecom domain and may not perform optimally on unrelated topics • While fine-tuned on telecom data, it inherits any limitations of the base DeepSeek-LLM 7B model • The model should not be used for generating harmful, misleading, or factually incorrect information • Performance may vary on telecom topics not well-represented in the training data • As with all language models, outputs should be verified by domain experts for critical applications
Training and Evaluation Data
The model was fine-tuned on a carefully curated telecom dataset consisting of: • 15,756 training examples across 6 telecom categories • 1,753 validation examples for model evaluation • Data includes technical support conversations, network documentation, regulatory information, and customer service interactions • The dataset was augmented from 3,000 diverse telecom examples to create the final training set
Training Procedure
Training Hyperparameters
• Fine-tuning method: LoRA (Low-Rank Adaptation) • LoRA rank: 16 • LoRA alpha: 32 • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj • Training epochs: 3 • Batch size: 1-2 (with gradient accumulation steps of 4-8) • Learning rate: 2e-4 with cosine decay • Optimizer: AdamW • Weight decay: 0.01 • Sequence length: 512 • Training precision: float16
Training Environment
• Trained on a local machine with 48GB RAM • Used Metal Performance Shaders (MPS) for GPU acceleration • Training time: Approximately 68 hours • Trainable parameters: 37,478,400 (0.54% of total model parameters)
Results
The model achieved a best evaluation loss of 0.07133 during training, showing strong performance on telecom domain tasks. The training showed good convergence in the first epoch with consistent improvement in performance metrics.
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.1172 | 0.0508 | 100 | 0.1089 |
0.0824 | 0.1015 | 200 | 0.0889 |
0.0765 | 0.1523 | 300 | 0.0831 |
0.0662 | 0.2031 | 400 | 0.0798 |
0.0731 | 0.2539 | 500 | 0.0783 |
0.0785 | 0.3046 | 600 | 0.0753 |
0.073 | 0.3554 | 700 | 0.0779 |
0.0665 | 0.4062 | 800 | 0.0722 |
0.0695 | 0.4570 | 900 | 0.0736 |
0.0656 | 0.5077 | 1000 | 0.0727 |
0.0766 | 0.5585 | 1100 | 0.0724 |
0.062 | 0.6093 | 1200 | 0.0714 |
0.0625 | 0.6601 | 1300 | 0.0731 |
0.0726 | 0.7108 | 1400 | 0.0744 |
0.0633 | 0.7616 | 1500 | 0.0713 |
0.0643 | 0.8124 | 1600 | 0.0719 |
0.0698 | 0.8632 | 1700 | 0.0764 |
0.0706 | 0.9139 | 1800 | 0.0707 |
0.0765 | 0.9647 | 1900 | 0.0743 |
0.071 | 1.0152 | 2000 | 0.0782 |
0.0665 | 1.0660 | 2100 | 0.0977 |
0.0685 | 1.1168 | 2200 | 0.1068 |
0.0617 | 1.1676 | 2300 | 0.0862 |
0.0679 | 1.2183 | 2400 | 0.0974 |
0.0643 | 1.2691 | 2500 | 0.1419 |
0.0644 | 1.3199 | 2600 | 0.1259 |
0.0759 | 1.3707 | 2700 | 0.0990 |
0.0786 | 1.4214 | 2800 | 0.1181 |
0.0695 | 1.4722 | 2900 | 0.1083 |
0.0733 | 1.5230 | 3000 | 0.1150 |
0.0811 | 1.5737 | 3100 | 0.1055 |
0.2612 | 1.6245 | 3200 | 0.2676 |
0.2466 | 1.6753 | 3300 | 0.2725 |
0.342 | 1.7261 | 3400 | 0.3552 |
0.3612 | 1.7768 | 3500 | 0.3656 |
0.3883 | 1.8276 | 3600 | 0.3825 |
0.3796 | 1.8784 | 3700 | 0.3845 |
0.3744 | 1.9292 | 3800 | 0.3900 |
0.383 | 1.9799 | 3900 | 0.3991 |
0.0 | 2.0305 | 4000 | nan |
0.0 | 2.0812 | 4100 | nan |
0.0 | 2.1320 | 4200 | nan |
0.0 | 2.1828 | 4300 | nan |
0.0 | 2.2336 | 4400 | nan |
0.0 | 2.2843 | 4500 | nan |
0.0 | 2.3351 | 4600 | nan |
0.0 | 2.3859 | 4700 | nan |
0.0 | 2.4367 | 4800 | nan |
0.0 | 2.4874 | 4900 | nan |
0.0 | 2.5382 | 5000 | nan |
0.0 | 2.5890 | 5100 | nan |
0.0 | 2.6398 | 5200 | nan |
0.0 | 2.6905 | 5300 | nan |
0.0 | 2.7413 | 5400 | nan |
0.0 | 2.7921 | 5500 | nan |
0.0 | 2.8429 | 5600 | nan |
0.0 | 2.8936 | 5700 | nan |
0.0 | 2.9444 | 5800 | nan |
0.0 | 2.9952 | 5900 | nan |
Framework versions
- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.7.0
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -
Model tree for chendren/deepseek-llm-7b-telecom-finetuned
Base model
deepseek-ai/deepseek-llm-7b-base