lusxvr HF Staff commited on
Commit
36d62ce
·
1 Parent(s): 8a84412
Files changed (1) hide show
  1. app/src/content/article.mdx +3 -3
app/src/content/article.mdx CHANGED
@@ -493,21 +493,21 @@ The standard training procedure of a VLM usually follows at least two stages. Fi
493
 
494
  To investigate this on small models, we experiment both with single, dual and triple stage training.
495
 
 
496
  #### 1 Stage vs 2 Stages
497
 
498
  <HtmlEmbed src="ss-vs-s1.html" desc="Average Rank of a model trained for 20K steps in a single stage, and a model trained for the same 20k steps on top of pretraining the Modality Projection and Vision Encoder for 10k steps." />
499
- ---
500
 
501
  We observe that at this model size, with this amount of available data, training only a single stage actually outperforms a multi stage approach.
502
 
 
503
  #### 2 Stages vs 2.5 Stages
504
  We also experiment if splitting the second stage results in any performance improvements.
505
 
506
  We take the baseline, and continue training for another 20k steps, both with the unfiltered (>= 1) as well as filtered subsets of **FineVision** according to our ratings.
507
 
508
- ---
509
  <HtmlEmbed src="s25-ratings.html" desc="Average Rank if a model trained for an additional 20K steps on top of unfiltered training for 20K steps." />
510
- ---
511
 
512
  Like in the previous results, we observe that the best outcome is simply achieved by training on as much data as possible.
513
 
 
493
 
494
  To investigate this on small models, we experiment both with single, dual and triple stage training.
495
 
496
+ ---
497
  #### 1 Stage vs 2 Stages
498
 
499
  <HtmlEmbed src="ss-vs-s1.html" desc="Average Rank of a model trained for 20K steps in a single stage, and a model trained for the same 20k steps on top of pretraining the Modality Projection and Vision Encoder for 10k steps." />
500
+
501
 
502
  We observe that at this model size, with this amount of available data, training only a single stage actually outperforms a multi stage approach.
503
 
504
+ ---
505
  #### 2 Stages vs 2.5 Stages
506
  We also experiment if splitting the second stage results in any performance improvements.
507
 
508
  We take the baseline, and continue training for another 20k steps, both with the unfiltered (>= 1) as well as filtered subsets of **FineVision** according to our ratings.
509
 
 
510
  <HtmlEmbed src="s25-ratings.html" desc="Average Rank if a model trained for an additional 20K steps on top of unfiltered training for 20K steps." />
 
511
 
512
  Like in the previous results, we observe that the best outcome is simply achieved by training on as much data as possible.
513