Update README.md
Browse files
README.md
CHANGED
@@ -120,61 +120,34 @@ The default motion_score = 5 is suitable for general use. If you need more stabi
|
|
120 |
We build [Step-Video-TI2V-Eval](https://github.com/stepfun-ai/Step-Video-T2V/blob/main/benchmark/Step-Video-T2V-Eval), a new benchmark designed for the text-driven image-to-video generation task. The dataset comprises 178 real-world and 120 anime-style prompt-image pairs, ensuring broad coverage of diverse user scenarios. To achieve comprehensive representation, we developed a fine-grained schema for data collection in both categories.
|
121 |
|
122 |
|
123 |
-
<table border="0" style="width: 100%; text-align: center; margin-top: 1px;">
|
124 |
-
<tr>
|
125 |
-
<th style="width: 20%;">vs. OSTopA</th>
|
126 |
-
<th style="width: 20%;">vs. OSTopB</th>
|
127 |
-
<th style="width: 20%;">vs. CSTopC</th>
|
128 |
-
<th style="width: 20%;">vs. CSTopD</th>
|
129 |
-
</tr>
|
130 |
-
<tr>
|
131 |
-
<td>37-63-79</td>
|
132 |
-
<td>101-48-29</td>
|
133 |
-
<td>41-46-73</td>
|
134 |
-
<td>92-51-18</td>
|
135 |
-
</tr>
|
136 |
-
<tr>
|
137 |
-
<td>40-35-44</td>
|
138 |
-
<td>94-16-10</td>
|
139 |
-
<td>52-35-47</td>
|
140 |
-
<td>87-18-17</td>
|
141 |
-
</tr>
|
142 |
-
<tr>
|
143 |
-
<td>46-92-39</td>
|
144 |
-
<td>43-71-64</td>
|
145 |
-
<td>45-65-50</td>
|
146 |
-
<td>36-77-47</td>
|
147 |
-
</tr>
|
148 |
-
<tr>
|
149 |
-
<td>42-61-18</td>
|
150 |
-
<td>50-35-35</td>
|
151 |
-
<td>29-62-43</td>
|
152 |
-
<td>37-63-23</td>
|
153 |
-
</tr>
|
154 |
-
<tr>
|
155 |
-
<td>52-57-49</td>
|
156 |
-
<td>71-40-66</td>
|
157 |
-
<td>58-33-69</td>
|
158 |
-
<td>67-33-60</td>
|
159 |
-
</tr>
|
160 |
-
<tr>
|
161 |
-
<td>75-17-28</td>
|
162 |
-
<td>67-30-24</td>
|
163 |
-
<td>78-17-39</td>
|
164 |
-
<td>68-41-14</td>
|
165 |
-
</tr>
|
166 |
-
<tr>
|
167 |
-
<th colspan="4">Total Score</th>
|
168 |
-
</tr>
|
169 |
-
<tr>
|
170 |
-
<td>292-325-277</td>
|
171 |
-
<td>426-240-228</td>
|
172 |
-
<td>303-258-321</td>
|
173 |
-
<td>387-283-179</td>
|
174 |
-
</tr>
|
175 |
-
</table>
|
176 |
-
<p style="text-align: center;"><strong>Table 1: Comparison with baseline TI2V models using Step-Video-TI2V-Eval.</strong></p>
|
177 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
|
179 |
|
180 |
[VBench](https://arxiv.org/html/2411.13503v1) is a comprehensive benchmark suite that deconstructs “video generation quality” into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. We utilize the VBench-I2V benchmark to assess the performance of Step-Video-TI2V alongside other TI2V models.
|
|
|
120 |
We build [Step-Video-TI2V-Eval](https://github.com/stepfun-ai/Step-Video-T2V/blob/main/benchmark/Step-Video-T2V-Eval), a new benchmark designed for the text-driven image-to-video generation task. The dataset comprises 178 real-world and 120 anime-style prompt-image pairs, ensuring broad coverage of diverse user scenarios. To achieve comprehensive representation, we developed a fine-grained schema for data collection in both categories.
|
121 |
|
122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
123 |
|
124 |
+
<table border="0" style="width: 100%; text-align: center; margin-top: 10px; border-collapse: collapse; border-radius: 8px; overflow: hidden;">
|
125 |
+
<thead>
|
126 |
+
<tr style="">
|
127 |
+
<th style="width: 25%; padding: 10px;">vs. OSTopA</th>
|
128 |
+
<th style="width: 25%; padding: 10px;">vs. OSTopB</th>
|
129 |
+
<th style="width: 25%; padding: 10px;">vs. CSTopC</th>
|
130 |
+
<th style="width: 25%; padding: 10px;">vs. CSTopD</th>
|
131 |
+
</tr>
|
132 |
+
</thead>
|
133 |
+
<tbody>
|
134 |
+
<tr><td>37-63-79</td><td>101-48-29</td><td>41-46-73</td><td>92-51-18</td></tr>
|
135 |
+
<tr><td>40-35-44</td><td>94-16-10</td><td>52-35-47</td><td>87-18-17</td></tr>
|
136 |
+
<tr><td>46-92-39</td><td>43-71-64</td><td>45-65-50</td><td>36-77-47</td></tr>
|
137 |
+
<tr><td>42-61-18</td><td>50-35-35</td><td>29-62-43</td><td>37-63-23</td></tr>
|
138 |
+
<tr><td>52-57-49</td><td>71-40-66</td><td>58-33-69</td><td>67-33-60</td></tr>
|
139 |
+
<tr><td>75-17-28</td><td>67-30-24</td><td>78-17-39</td><td>68-41-14</td></tr>
|
140 |
+
<tr style="">
|
141 |
+
<td colspan="4" style="padding: 10px; font-weight: bold;">Total Score</td>
|
142 |
+
</tr>
|
143 |
+
<tr>
|
144 |
+
<td>292-325-277</td>
|
145 |
+
<td>426-240-228</td>
|
146 |
+
<td>303-258-321</td>
|
147 |
+
<td>387-283-179</td>
|
148 |
+
</tr>
|
149 |
+
</tbody>
|
150 |
+
</table>
|
151 |
|
152 |
|
153 |
[VBench](https://arxiv.org/html/2411.13503v1) is a comprehensive benchmark suite that deconstructs “video generation quality” into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. We utilize the VBench-I2V benchmark to assess the performance of Step-Video-TI2V alongside other TI2V models.
|