You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

deepseek-llm-7b-telecom-finetuned

This model is a fine-tuned version of deepseek-ai/deepseek-llm-7b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: nan

Model Description

This model is a fine-tuned version of the DeepSeek-LLM 7B base model, specifically optimized for telecom domain tasks. It has been trained on a comprehensive dataset of telecom-specific content covering technical support, network infrastructure, telecommunications regulations, customer service, and product information. The model uses LoRA (Low-Rank Adaptation) fine-tuning to efficiently adapt the base model to the telecom domain while maintaining the general capabilities of the original model.

Intended Uses & Limitations

Intended Uses

• Providing technical support for telecom products and services • Answering questions about network infrastructure and protocols • Explaining telecommunications regulations and compliance requirements • Assisting with customer service inquiries in the telecom sector • Generating documentation and explanations for telecom products • Supporting telecom professionals with domain-specific knowledge

Limitations

• The model is specialized for the telecom domain and may not perform optimally on unrelated topics • While fine-tuned on telecom data, it inherits any limitations of the base DeepSeek-LLM 7B model • The model should not be used for generating harmful, misleading, or factually incorrect information • Performance may vary on telecom topics not well-represented in the training data • As with all language models, outputs should be verified by domain experts for critical applications

Training and Evaluation Data

The model was fine-tuned on a carefully curated telecom dataset consisting of: • 15,756 training examples across 6 telecom categories • 1,753 validation examples for model evaluation • Data includes technical support conversations, network documentation, regulatory information, and customer service interactions • The dataset was augmented from 3,000 diverse telecom examples to create the final training set

Training Procedure

Training Hyperparameters

Fine-tuning method: LoRA (Low-Rank Adaptation) • LoRA rank: 16 • LoRA alpha: 32 • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj • Training epochs: 3 • Batch size: 1-2 (with gradient accumulation steps of 4-8) • Learning rate: 2e-4 with cosine decay • Optimizer: AdamW • Weight decay: 0.01 • Sequence length: 512 • Training precision: float16

Training Environment

• Trained on a local machine with 48GB RAM • Used Metal Performance Shaders (MPS) for GPU acceleration • Training time: Approximately 68 hours • Trainable parameters: 37,478,400 (0.54% of total model parameters)

Results

The model achieved a best evaluation loss of 0.07133 during training, showing strong performance on telecom domain tasks. The training showed good convergence in the first epoch with consistent improvement in performance metrics.

Training results

Training Loss Epoch Step Validation Loss
0.1172 0.0508 100 0.1089
0.0824 0.1015 200 0.0889
0.0765 0.1523 300 0.0831
0.0662 0.2031 400 0.0798
0.0731 0.2539 500 0.0783
0.0785 0.3046 600 0.0753
0.073 0.3554 700 0.0779
0.0665 0.4062 800 0.0722
0.0695 0.4570 900 0.0736
0.0656 0.5077 1000 0.0727
0.0766 0.5585 1100 0.0724
0.062 0.6093 1200 0.0714
0.0625 0.6601 1300 0.0731
0.0726 0.7108 1400 0.0744
0.0633 0.7616 1500 0.0713
0.0643 0.8124 1600 0.0719
0.0698 0.8632 1700 0.0764
0.0706 0.9139 1800 0.0707
0.0765 0.9647 1900 0.0743
0.071 1.0152 2000 0.0782
0.0665 1.0660 2100 0.0977
0.0685 1.1168 2200 0.1068
0.0617 1.1676 2300 0.0862
0.0679 1.2183 2400 0.0974
0.0643 1.2691 2500 0.1419
0.0644 1.3199 2600 0.1259
0.0759 1.3707 2700 0.0990
0.0786 1.4214 2800 0.1181
0.0695 1.4722 2900 0.1083
0.0733 1.5230 3000 0.1150
0.0811 1.5737 3100 0.1055
0.2612 1.6245 3200 0.2676
0.2466 1.6753 3300 0.2725
0.342 1.7261 3400 0.3552
0.3612 1.7768 3500 0.3656
0.3883 1.8276 3600 0.3825
0.3796 1.8784 3700 0.3845
0.3744 1.9292 3800 0.3900
0.383 1.9799 3900 0.3991
0.0 2.0305 4000 nan
0.0 2.0812 4100 nan
0.0 2.1320 4200 nan
0.0 2.1828 4300 nan
0.0 2.2336 4400 nan
0.0 2.2843 4500 nan
0.0 2.3351 4600 nan
0.0 2.3859 4700 nan
0.0 2.4367 4800 nan
0.0 2.4874 4900 nan
0.0 2.5382 5000 nan
0.0 2.5890 5100 nan
0.0 2.6398 5200 nan
0.0 2.6905 5300 nan
0.0 2.7413 5400 nan
0.0 2.7921 5500 nan
0.0 2.8429 5600 nan
0.0 2.8936 5700 nan
0.0 2.9444 5800 nan
0.0 2.9952 5900 nan

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.7.0
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
6.91B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chendren/deepseek-llm-7b-telecom-finetuned

Adapter
(22)
this model