deepseek-llm-7b-telecom-finetuned

This model is a fine-tuned version of deepseek-ai/deepseek-llm-7b-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: nan

Model Description

This model is a fine-tuned version of the DeepSeek-LLM 7B base model, specifically optimized for telecom domain tasks. It has been trained on a comprehensive dataset of telecom-specific content covering technical support, network infrastructure, telecommunications regulations, customer service, and product information. The model uses LoRA (Low-Rank Adaptation) fine-tuning to efficiently adapt the base model to the telecom domain while maintaining the general capabilities of the original model.

Intended Uses & Limitations

Intended Uses

• Providing technical support for telecom products and services • Answering questions about network infrastructure and protocols • Explaining telecommunications regulations and compliance requirements • Assisting with customer service inquiries in the telecom sector • Generating documentation and explanations for telecom products • Supporting telecom professionals with domain-specific knowledge

Limitations

• The model is specialized for the telecom domain and may not perform optimally on unrelated topics • While fine-tuned on telecom data, it inherits any limitations of the base DeepSeek-LLM 7B model • The model should not be used for generating harmful, misleading, or factually incorrect information • Performance may vary on telecom topics not well-represented in the training data • As with all language models, outputs should be verified by domain experts for critical applications

Training and Evaluation Data

The model was fine-tuned on a carefully curated telecom dataset consisting of: • 15,756 training examples across 6 telecom categories • 1,753 validation examples for model evaluation • Data includes technical support conversations, network documentation, regulatory information, and customer service interactions • The dataset was augmented from 3,000 diverse telecom examples to create the final training set

Training Procedure

Training Hyperparameters

• Fine-tuning method: LoRA (Low-Rank Adaptation) • LoRA rank: 16 • LoRA alpha: 32 • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj • Training epochs: 3 • Batch size: 1-2 (with gradient accumulation steps of 4-8) • Learning rate: 2e-4 with cosine decay • Optimizer: AdamW • Weight decay: 0.01 • Sequence length: 512 • Training precision: float16

Training Environment

• Trained on a local machine with 48GB RAM • Used Metal Performance Shaders (MPS) for GPU acceleration • Training time: Approximately 68 hours • Trainable parameters: 37,478,400 (0.54% of total model parameters)

Results

The model achieved a best evaluation loss of 0.07133 during training, showing strong performance on telecom domain tasks. The training showed good convergence in the first epoch with consistent improvement in performance metrics.

Training results

Training Loss	Epoch	Step	Validation Loss
0.1172	0.0508	100	0.1089
0.0824	0.1015	200	0.0889
0.0765	0.1523	300	0.0831
0.0662	0.2031	400	0.0798
0.0731	0.2539	500	0.0783
0.0785	0.3046	600	0.0753
0.073	0.3554	700	0.0779
0.0665	0.4062	800	0.0722
0.0695	0.4570	900	0.0736
0.0656	0.5077	1000	0.0727
0.0766	0.5585	1100	0.0724
0.062	0.6093	1200	0.0714
0.0625	0.6601	1300	0.0731
0.0726	0.7108	1400	0.0744
0.0633	0.7616	1500	0.0713
0.0643	0.8124	1600	0.0719
0.0698	0.8632	1700	0.0764
0.0706	0.9139	1800	0.0707
0.0765	0.9647	1900	0.0743
0.071	1.0152	2000	0.0782
0.0665	1.0660	2100	0.0977
0.0685	1.1168	2200	0.1068
0.0617	1.1676	2300	0.0862
0.0679	1.2183	2400	0.0974
0.0643	1.2691	2500	0.1419
0.0644	1.3199	2600	0.1259
0.0759	1.3707	2700	0.0990
0.0786	1.4214	2800	0.1181
0.0695	1.4722	2900	0.1083
0.0733	1.5230	3000	0.1150
0.0811	1.5737	3100	0.1055
0.2612	1.6245	3200	0.2676
0.2466	1.6753	3300	0.2725
0.342	1.7261	3400	0.3552
0.3612	1.7768	3500	0.3656
0.3883	1.8276	3600	0.3825
0.3796	1.8784	3700	0.3845
0.3744	1.9292	3800	0.3900
0.383	1.9799	3900	0.3991
0.0	2.0305	4000	nan
0.0	2.0812	4100	nan
0.0	2.1320	4200	nan
0.0	2.1828	4300	nan
0.0	2.2336	4400	nan
0.0	2.2843	4500	nan
0.0	2.3351	4600	nan
0.0	2.3859	4700	nan
0.0	2.4367	4800	nan
0.0	2.4874	4900	nan
0.0	2.5382	5000	nan
0.0	2.5890	5100	nan
0.0	2.6398	5200	nan
0.0	2.6905	5300	nan
0.0	2.7413	5400	nan
0.0	2.7921	5500	nan
0.0	2.8429	5600	nan
0.0	2.8936	5700	nan
0.0	2.9444	5800	nan
0.0	2.9952	5900	nan

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.7.0
Datasets 3.6.0
Tokenizers 0.21.1

chendren
/

deepseek-llm-7b-telecom-finetuned

You need to agree to share your contact information to access this model