Update README.md
Browse files
README.md
CHANGED
@@ -122,20 +122,6 @@ This synergistic combination creates a model that excels not only at providing a
|
|
122 |
|
123 |
### Training Results
|
124 |
|
125 |
-
#### Performance Metrics
|
126 |
-
- **Final Training Loss**: 0.003759
|
127 |
-
- **Training Runtime**: 8,446.67 seconds (~2.35 hours)
|
128 |
-
- **Training Samples per Second**: 156.929
|
129 |
-
- **Training Steps per Second**: 4.904
|
130 |
-
- **Total Training Steps**: 41,400
|
131 |
-
- **Completed Epochs**: 4.999924559047633
|
132 |
-
|
133 |
-
#### Resource Utilization
|
134 |
-
- **Total Input Tokens Seen**: 2,531,530,240 tokens
|
135 |
-
- **Total FLOPs**: 3.96 × 10²⁰
|
136 |
-
- **DDP Timeout**: 180,000,000 seconds
|
137 |
-
- **Plot Loss**: Enabled (training loss visualization available)
|
138 |
-
|
139 |
### Training Loss Curve
|
140 |
The model training included comprehensive loss tracking and visualization. The training loss curve below shows the convergence pattern over the 41,400 training steps across 5 epochs:
|
141 |
|
|
|
122 |
|
123 |
### Training Results
|
124 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
125 |
### Training Loss Curve
|
126 |
The model training included comprehensive loss tracking and visualization. The training loss curve below shows the convergence pattern over the 41,400 training steps across 5 epochs:
|
127 |
|