bwang3579 commited on
Commit
594ab4e
·
verified ·
1 Parent(s): e3ae15c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -54
README.md CHANGED
@@ -120,61 +120,34 @@ The default motion_score = 5 is suitable for general use. If you need more stabi
120
  We build [Step-Video-TI2V-Eval](https://github.com/stepfun-ai/Step-Video-T2V/blob/main/benchmark/Step-Video-T2V-Eval), a new benchmark designed for the text-driven image-to-video generation task. The dataset comprises 178 real-world and 120 anime-style prompt-image pairs, ensuring broad coverage of diverse user scenarios. To achieve comprehensive representation, we developed a fine-grained schema for data collection in both categories.
121
 
122
 
123
- <table border="0" style="width: 100%; text-align: center; margin-top: 1px;">
124
- <tr>
125
- <th style="width: 20%;">vs. OSTopA</th>
126
- <th style="width: 20%;">vs. OSTopB</th>
127
- <th style="width: 20%;">vs. CSTopC</th>
128
- <th style="width: 20%;">vs. CSTopD</th>
129
- </tr>
130
- <tr>
131
- <td>37-63-79</td>
132
- <td>101-48-29</td>
133
- <td>41-46-73</td>
134
- <td>92-51-18</td>
135
- </tr>
136
- <tr>
137
- <td>40-35-44</td>
138
- <td>94-16-10</td>
139
- <td>52-35-47</td>
140
- <td>87-18-17</td>
141
- </tr>
142
- <tr>
143
- <td>46-92-39</td>
144
- <td>43-71-64</td>
145
- <td>45-65-50</td>
146
- <td>36-77-47</td>
147
- </tr>
148
- <tr>
149
- <td>42-61-18</td>
150
- <td>50-35-35</td>
151
- <td>29-62-43</td>
152
- <td>37-63-23</td>
153
- </tr>
154
- <tr>
155
- <td>52-57-49</td>
156
- <td>71-40-66</td>
157
- <td>58-33-69</td>
158
- <td>67-33-60</td>
159
- </tr>
160
- <tr>
161
- <td>75-17-28</td>
162
- <td>67-30-24</td>
163
- <td>78-17-39</td>
164
- <td>68-41-14</td>
165
- </tr>
166
- <tr>
167
- <th colspan="4">Total Score</th>
168
- </tr>
169
- <tr>
170
- <td>292-325-277</td>
171
- <td>426-240-228</td>
172
- <td>303-258-321</td>
173
- <td>387-283-179</td>
174
- </tr>
175
- </table>
176
- <p style="text-align: center;"><strong>Table 1: Comparison with baseline TI2V models using Step-Video-TI2V-Eval.</strong></p>
177
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
 
179
 
180
  [VBench](https://arxiv.org/html/2411.13503v1) is a comprehensive benchmark suite that deconstructs “video generation quality” into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. We utilize the VBench-I2V benchmark to assess the performance of Step-Video-TI2V alongside other TI2V models.
 
120
  We build [Step-Video-TI2V-Eval](https://github.com/stepfun-ai/Step-Video-T2V/blob/main/benchmark/Step-Video-T2V-Eval), a new benchmark designed for the text-driven image-to-video generation task. The dataset comprises 178 real-world and 120 anime-style prompt-image pairs, ensuring broad coverage of diverse user scenarios. To achieve comprehensive representation, we developed a fine-grained schema for data collection in both categories.
121
 
122
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
 
124
+ <table border="0" style="width: 100%; text-align: center; margin-top: 10px; border-collapse: collapse; border-radius: 8px; overflow: hidden;">
125
+ <thead>
126
+ <tr style="">
127
+ <th style="width: 25%; padding: 10px;">vs. OSTopA</th>
128
+ <th style="width: 25%; padding: 10px;">vs. OSTopB</th>
129
+ <th style="width: 25%; padding: 10px;">vs. CSTopC</th>
130
+ <th style="width: 25%; padding: 10px;">vs. CSTopD</th>
131
+ </tr>
132
+ </thead>
133
+ <tbody>
134
+ <tr><td>37-63-79</td><td>101-48-29</td><td>41-46-73</td><td>92-51-18</td></tr>
135
+ <tr><td>40-35-44</td><td>94-16-10</td><td>52-35-47</td><td>87-18-17</td></tr>
136
+ <tr><td>46-92-39</td><td>43-71-64</td><td>45-65-50</td><td>36-77-47</td></tr>
137
+ <tr><td>42-61-18</td><td>50-35-35</td><td>29-62-43</td><td>37-63-23</td></tr>
138
+ <tr><td>52-57-49</td><td>71-40-66</td><td>58-33-69</td><td>67-33-60</td></tr>
139
+ <tr><td>75-17-28</td><td>67-30-24</td><td>78-17-39</td><td>68-41-14</td></tr>
140
+ <tr style="">
141
+ <td colspan="4" style="padding: 10px; font-weight: bold;">Total Score</td>
142
+ </tr>
143
+ <tr>
144
+ <td>292-325-277</td>
145
+ <td>426-240-228</td>
146
+ <td>303-258-321</td>
147
+ <td>387-283-179</td>
148
+ </tr>
149
+ </tbody>
150
+ </table>
151
 
152
 
153
  [VBench](https://arxiv.org/html/2411.13503v1) is a comprehensive benchmark suite that deconstructs “video generation quality” into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. We utilize the VBench-I2V benchmark to assess the performance of Step-Video-TI2V alongside other TI2V models.