Spaces:
Running
Running
update
Browse files
app/src/content/article.mdx
CHANGED
@@ -493,21 +493,21 @@ The standard training procedure of a VLM usually follows at least two stages. Fi
|
|
493 |
|
494 |
To investigate this on small models, we experiment both with single, dual and triple stage training.
|
495 |
|
|
|
496 |
#### 1 Stage vs 2 Stages
|
497 |
|
498 |
<HtmlEmbed src="ss-vs-s1.html" desc="Average Rank of a model trained for 20K steps in a single stage, and a model trained for the same 20k steps on top of pretraining the Modality Projection and Vision Encoder for 10k steps." />
|
499 |
-
|
500 |
|
501 |
We observe that at this model size, with this amount of available data, training only a single stage actually outperforms a multi stage approach.
|
502 |
|
|
|
503 |
#### 2 Stages vs 2.5 Stages
|
504 |
We also experiment if splitting the second stage results in any performance improvements.
|
505 |
|
506 |
We take the baseline, and continue training for another 20k steps, both with the unfiltered (>= 1) as well as filtered subsets of **FineVision** according to our ratings.
|
507 |
|
508 |
-
---
|
509 |
<HtmlEmbed src="s25-ratings.html" desc="Average Rank if a model trained for an additional 20K steps on top of unfiltered training for 20K steps." />
|
510 |
-
---
|
511 |
|
512 |
Like in the previous results, we observe that the best outcome is simply achieved by training on as much data as possible.
|
513 |
|
|
|
493 |
|
494 |
To investigate this on small models, we experiment both with single, dual and triple stage training.
|
495 |
|
496 |
+
---
|
497 |
#### 1 Stage vs 2 Stages
|
498 |
|
499 |
<HtmlEmbed src="ss-vs-s1.html" desc="Average Rank of a model trained for 20K steps in a single stage, and a model trained for the same 20k steps on top of pretraining the Modality Projection and Vision Encoder for 10k steps." />
|
500 |
+
|
501 |
|
502 |
We observe that at this model size, with this amount of available data, training only a single stage actually outperforms a multi stage approach.
|
503 |
|
504 |
+
---
|
505 |
#### 2 Stages vs 2.5 Stages
|
506 |
We also experiment if splitting the second stage results in any performance improvements.
|
507 |
|
508 |
We take the baseline, and continue training for another 20k steps, both with the unfiltered (>= 1) as well as filtered subsets of **FineVision** according to our ratings.
|
509 |
|
|
|
510 |
<HtmlEmbed src="s25-ratings.html" desc="Average Rank if a model trained for an additional 20K steps on top of unfiltered training for 20K steps." />
|
|
|
511 |
|
512 |
Like in the previous results, we observe that the best outcome is simply achieved by training on as much data as possible.
|
513 |
|