gbyuvd commited on
Commit
cb0f7cb
·
verified ·
1 Parent(s): 5a4b149

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -244,7 +244,7 @@ Data Preprocessing
244
 
245
 
246
  - Batch size = 128
247
- - Num of Epoch= 36
248
 
249
  I am using Ranger21 optimizer with these settings:
250
 
@@ -252,10 +252,11 @@ I am using Ranger21 optimizer with these settings:
252
  Core optimizer = madgrad
253
  Learning rate of 1.5e-05
254
 
255
- num_epochs of training = ** 1 epochs **
 
256
 
257
  using AdaBelief for variance computation
258
- Warm-up: linear warmup, over 964 iterations (0.22)
259
 
260
  Lookahead active, merging every 5 steps, with blend factor of 0.5
261
  Norm Loss active, factor = 0.0001
@@ -265,6 +266,9 @@ Gradient Centralization = On
265
  Adaptive Gradient Clipping = True
266
  clipping value of 0.01
267
  steps for clipping = 0.001
 
 
 
268
  ```
269
 
270
  I turned off the warm down, since in prior experiments it led to instability of losses in my case.
 
244
 
245
 
246
  - Batch size = 128
247
+ - Num of Epoch= 36 (10, 12, 14; separate run - based on training dynamics, it seems training with one run with lots of epochs is better than doing separate run like this.)
248
 
249
  I am using Ranger21 optimizer with these settings:
250
 
 
252
  Core optimizer = madgrad
253
  Learning rate of 1.5e-05
254
 
255
+ Important - num_epochs of training = ** _(10, 12, 14; separate run)_ epochs **
256
+ please confirm this is correct or warmup and warmdown will be off
257
 
258
  using AdaBelief for variance computation
259
+ Warm-up: linear warmup, over 2000 iterations
260
 
261
  Lookahead active, merging every 5 steps, with blend factor of 0.5
262
  Norm Loss active, factor = 0.0001
 
266
  Adaptive Gradient Clipping = True
267
  clipping value of 0.01
268
  steps for clipping = 0.001
269
+ params size saved
270
+ total param groups = 1
271
+ total params in groups = 137
272
  ```
273
 
274
  I turned off the warm down, since in prior experiments it led to instability of losses in my case.