xuandin commited on
Commit
c25d616
·
verified ·
1 Parent(s): 5e9bee2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +391 -0
README.md CHANGED
@@ -85,6 +85,397 @@ print(evidence)
85
 
86
  ## **Evaluation Results**
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  **SemViQA-QATC** plays a crucial role in the **SemViQA** system by enhancing accuracy in evidence extraction. When integrated into a pipeline, this model helps determine whether a claim is supported or refuted based on retrieved evidence.
89
 
90
  ## **Citation**
 
85
 
86
  ## **Evaluation Results**
87
 
88
+ <table>
89
+ <thead>
90
+ <tr>
91
+ <th colspan="2">Method</th>
92
+ <th colspan="4">ViWikiFC</th>
93
+ <th colspan="4">ISE-DSC01</th>
94
+ </tr>
95
+ <tr>
96
+ <th>ER</th>
97
+ <th>VC</th>
98
+ <th>Strict Acc</th>
99
+ <th>VC Acc</th>
100
+ <th>ER Acc</th>
101
+ <th>Time (s)</th>
102
+ <th>Strict Acc</th>
103
+ <th>VC Acc</th>
104
+ <th>ER Acc</th>
105
+ <th>Time (s)</th>
106
+ </tr>
107
+ </thead>
108
+ <tbody>
109
+ <tr>
110
+ <td rowspan="3">TF-IDF</td>
111
+ <td>InfoXLM<sub>large</sub></td>
112
+ <td>75.56</td>
113
+ <td>82.21</td>
114
+ <td>90.15</td>
115
+ <td>131</td>
116
+ <td>73.59</td>
117
+ <td>78.08</td>
118
+ <td>76.61</td>
119
+ <td>378</td>
120
+ </tr>
121
+ <tr>
122
+ <td>XLM-R<sub>large</sub></td>
123
+ <td>76.47</td>
124
+ <td>82.78</td>
125
+ <td>90.15</td>
126
+ <td>134</td>
127
+ <td>75.61</td>
128
+ <td>80.50</td>
129
+ <td>78.58</td>
130
+ <td>366</td>
131
+ </tr>
132
+ <tr>
133
+ <td>Ernie-M<sub>large</sub></td>
134
+ <td>75.56</td>
135
+ <td>81.83</td>
136
+ <td>90.15</td>
137
+ <td>144</td>
138
+ <td>78.19</td>
139
+ <td>81.69</td>
140
+ <td>80.65</td>
141
+ <td>403</td>
142
+ </tr>
143
+ <tr>
144
+ <td rowspan="3">BM25</td>
145
+ <td>InfoXLM<sub>large</sub></td>
146
+ <td>70.44</td>
147
+ <td>79.01</td>
148
+ <td>83.50</td>
149
+ <td>130</td>
150
+ <td>72.09</td>
151
+ <td>77.37</td>
152
+ <td>75.04</td>
153
+ <td>320</td>
154
+ </tr>
155
+ <tr>
156
+ <td>XLM-R<sub>large</sub></td>
157
+ <td>70.97</td>
158
+ <td>78.91</td>
159
+ <td>83.50</td>
160
+ <td>132</td>
161
+ <td>73.94</td>
162
+ <td>79.37</td>
163
+ <td>76.95</td>
164
+ <td>333</td>
165
+ </tr>
166
+ <tr>
167
+ <td>Ernie-M<sub>large</sub></td>
168
+ <td>70.21</td>
169
+ <td>78.29</td>
170
+ <td>83.50</td>
171
+ <td>141</td>
172
+ <td>76.58</td>
173
+ <td>80.76</td>
174
+ <td>79.02</td>
175
+ <td>381</td>
176
+ </tr>
177
+ <tr>
178
+ <td rowspan="3">SBert</td>
179
+ <td>InfoXLM<sub>large</sub></td>
180
+ <td>74.99</td>
181
+ <td>81.59</td>
182
+ <td>89.72</td>
183
+ <td>195</td>
184
+ <td>71.20</td>
185
+ <td>76.59</td>
186
+ <td>74.15</td>
187
+ <td>915</td>
188
+ </tr>
189
+ <tr>
190
+ <td>XLM-R<sub>large</sub></td>
191
+ <td>75.80</td>
192
+ <td>82.35</td>
193
+ <td>89.72</td>
194
+ <td>194</td>
195
+ <td>72.85</td>
196
+ <td>78.78</td>
197
+ <td>75.89</td>
198
+ <td>835</td>
199
+ </tr>
200
+ <tr>
201
+ <td>Ernie-M<sub>large</sub></td>
202
+ <td>75.13</td>
203
+ <td>81.44</td>
204
+ <td>89.72</td>
205
+ <td>203</td>
206
+ <td>75.46</td>
207
+ <td>79.89</td>
208
+ <td>77.91</td>
209
+ <td>920</td>
210
+ </tr>
211
+ <tr>
212
+ <td colspan="10"><strong>QA-based approaches</strong> | <strong>VC</strong></td>
213
+ </tr>
214
+ <tr>
215
+ <td rowspan="3">ViMRC<sub>large</sub></td>
216
+ <td>InfoXLM<sub>large</sub></td>
217
+ <td>77.28</td>
218
+ <td>81.97</td>
219
+ <td>92.49</td>
220
+ <td>3778</td>
221
+ <td>54.36</td>
222
+ <td>64.14</td>
223
+ <td>56.84</td>
224
+ <td>9798</td>
225
+ </tr>
226
+ <tr>
227
+ <td>XLM-R<sub>large</sub></td>
228
+ <td>78.29</td>
229
+ <td>82.83</td>
230
+ <td>92.49</td>
231
+ <td>3824</td>
232
+ <td>53.98</td>
233
+ <td>66.70</td>
234
+ <td>57.77</td>
235
+ <td>9809</td>
236
+ </tr>
237
+ <tr>
238
+ <td>Ernie-M<sub>large</sub></td>
239
+ <td>77.38</td>
240
+ <td>81.92</td>
241
+ <td>92.49</td>
242
+ <td>3785</td>
243
+ <td>56.62</td>
244
+ <td>62.19</td>
245
+ <td>58.91</td>
246
+ <td>9833</td>
247
+ </tr>
248
+ <tr>
249
+ <td rowspan="3">InfoXLM<sub>large</sub></td>
250
+ <td>InfoXLM<sub>large</sub></td>
251
+ <td>78.14</td>
252
+ <td>82.07</td>
253
+ <td>93.45</td>
254
+ <td>4092</td>
255
+ <td>53.50</td>
256
+ <td>63.83</td>
257
+ <td>56.17</td>
258
+ <td>10057</td>
259
+ </tr>
260
+ <tr>
261
+ <td>XLM-R<sub>large</sub></td>
262
+ <td>79.20</td>
263
+ <td>83.07</td>
264
+ <td>93.45</td>
265
+ <td>4096</td>
266
+ <td>53.32</td>
267
+ <td>66.70</td>
268
+ <td>57.25</td>
269
+ <td>10066</td>
270
+ </tr>
271
+ <tr>
272
+ <td>Ernie-M<sub>large</sub></td>
273
+ <td>78.24</td>
274
+ <td>82.21</td>
275
+ <td>93.45</td>
276
+ <td>4102</td>
277
+ <td>56.34</td>
278
+ <td>62.36</td>
279
+ <td>58.69</td>
280
+ <td>10078</td>
281
+ </tr>
282
+ <tr>
283
+ <td colspan="10"><strong>LLM</strong></td>
284
+ </tr>
285
+ <tr>
286
+ <td colspan="2">Qwen2.5-1.5B-Instruct</td>
287
+ <td>51.03</td>
288
+ <td>65.18</td>
289
+ <td>78.96</td>
290
+ <td>7665</td>
291
+ <td>59.23</td>
292
+ <td>66.68</td>
293
+ <td>65.51</td>
294
+ <td>19780</td>
295
+ </tr>
296
+ <tr>
297
+ <td colspan="2">Qwen2.5-3B-Instruct</td>
298
+ <td>44.38</td>
299
+ <td>62.31</td>
300
+ <td>71.35</td>
301
+ <td>12123</td>
302
+ <td>60.87</td>
303
+ <td>66.92</td>
304
+ <td>66.10</td>
305
+ <td>31284</td>
306
+ </tr>
307
+ <tr>
308
+ <td colspan="10"><strong>LLM</strong> | <strong>VC</strong></td>
309
+ </tr>
310
+ <tr>
311
+ <td rowspan="3">Qwen2.5-1.5B-Instruct</td>
312
+ <td>InfoXLM<sub>large</sub></td>
313
+ <td>66.14</td>
314
+ <td>76.47</td>
315
+ <td>78.96</td>
316
+ <td>7788</td>
317
+ <td>64.40</td>
318
+ <td>68.37</td>
319
+ <td>66.49</td>
320
+ <td>19970</td>
321
+ </tr>
322
+ <tr>
323
+ <td>XLM-R<sub>large</sub></td>
324
+ <td>67.67</td>
325
+ <td>78.10</td>
326
+ <td>78.96</td>
327
+ <td>7789</td>
328
+ <td>64.66</td>
329
+ <td>69.63</td>
330
+ <td>66.72</td>
331
+ <td>19976</td>
332
+ </tr>
333
+ <tr>
334
+ <td>Ernie-M<sub>large</sub></td>
335
+ <td>66.52</td>
336
+ <td>76.52</td>
337
+ <td>78.96</td>
338
+ <td>7794</td>
339
+ <td>65.70</td>
340
+ <td>68.37</td>
341
+ <td>67.33</td>
342
+ <td>20003</td>
343
+ </tr>
344
+ <tr>
345
+ <td rowspan="3">Qwen2.5-3B-Instruct</td>
346
+ <td>InfoXLM<sub>large</sub></td>
347
+ <td>59.88</td>
348
+ <td>72.50</td>
349
+ <td>71.35</td>
350
+ <td>12246</td>
351
+ <td>65.72</td>
352
+ <td>69.66</td>
353
+ <td>67.51</td>
354
+ <td>31477</td>
355
+ </tr>
356
+ <tr>
357
+ <td>XLM-R<sub>large</sub></td>
358
+ <td>60.74</td>
359
+ <td>73.08</td>
360
+ <td>71.35</td>
361
+ <td>12246</td>
362
+ <td>66.12</td>
363
+ <td>70.44</td>
364
+ <td>67.83</td>
365
+ <td>31483</td>
366
+ </tr>
367
+ <tr>
368
+ <td>Ernie-M<sub>large</sub></td>
369
+ <td>60.02</td>
370
+ <td>72.21</td>
371
+ <td>71.35</td>
372
+ <td>12251</td>
373
+ <td>67.48</td>
374
+ <td>70.77</td>
375
+ <td>68.75</td>
376
+ <td>31512</td>
377
+ </tr>
378
+ <tr>
379
+ <td colspan="10"><strong>SER Faster (ours)</strong> | <strong>TVC (ours)</strong></td>
380
+ </tr>
381
+ <tr>
382
+ <td>TF-IDF + ViMRC<sub>large</sub></td>
383
+ <td>Ernie-M<sub>large</sub></td>
384
+ <td>79.44</td>
385
+ <td>82.93</td>
386
+ <td>94.60</td>
387
+ <td>410</td>
388
+ <td>78.32</td>
389
+ <td>81.91</td>
390
+ <td>80.26</td>
391
+ <td>995</td>
392
+ </tr>
393
+ <tr>
394
+ <td>TF-IDF + InfoXLM<sub>large</sub></td>
395
+ <td>Ernie-M<sub>large</sub></td>
396
+ <td>79.77</td>
397
+ <td>83.07</td>
398
+ <td>95.03</td>
399
+ <td>487</td>
400
+ <td>78.37</td>
401
+ <td>81.91</td>
402
+ <td>80.32</td>
403
+ <td>925</td>
404
+ </tr>
405
+ <tr>
406
+ <td colspan="10"><strong>SER (ours)</strong> | <strong>TVC (ours)</strong></td>
407
+ </tr>
408
+ <tr>
409
+ <td rowspan="3">TF-IDF + ViMRC<sub>large</sub></td>
410
+ <td>InfoXLM<sub>large</sub></td>
411
+ <td>80.25</td>
412
+ <td>83.84</td>
413
+ <td>94.69</td>
414
+ <td>2731</td>
415
+ <td>75.13</td>
416
+ <td>79.54</td>
417
+ <td>76.87</td>
418
+ <td>5191</td>
419
+ </tr>
420
+ <tr>
421
+ <td>XLM-R<sub>large</sub></td>
422
+ <td>80.34</td>
423
+ <td>83.64</td>
424
+ <td>94.69</td>
425
+ <td>2733</td>
426
+ <td>76.71</td>
427
+ <td>81.65</td>
428
+ <td>78.91</td>
429
+ <td>5219</td>
430
+ </tr>
431
+ <tr>
432
+ <td>Ernie-M<sub>large</sub></td>
433
+ <td>79.53</td>
434
+ <td>82.97</td>
435
+ <td>94.69</td>
436
+ <td>2733</td>
437
+ <td>78.97</td>
438
+ <td>82.54</td>
439
+ <td>80.91</td>
440
+ <td>5225</td>
441
+ </tr>
442
+ <tr>
443
+ <td rowspan="3">TF-IDF + InfoXLM<sub>large</sub></td>
444
+ <td>InfoXLM<sub>large</sub></td>
445
+ <td>80.68</td>
446
+ <td>83.98</td>
447
+ <td>95.31</td>
448
+ <td>3860</td>
449
+ <td>75.13</td>
450
+ <td>79.60</td>
451
+ <td>76.87</td>
452
+ <td>5175</td>
453
+ </tr>
454
+ <tr>
455
+ <td>XLM-R<sub>large</sub></td>
456
+ <td>80.82</td>
457
+ <td>83.88</td>
458
+ <td>95.31</td>
459
+ <td>3843</td>
460
+ <td>76.74</td>
461
+ <td>81.71</td>
462
+ <td>78.95</td>
463
+ <td>5200</td>
464
+ </tr>
465
+ <tr>
466
+ <td>Ernie-M<sub>large</sub></td>
467
+ <td>80.06</td>
468
+ <td>83.17</td>
469
+ <td>95.31</td>
470
+ <td>3891</td>
471
+ <td>78.97</td>
472
+ <td>82.49</td>
473
+ <td>80.91</td>
474
+ <td>5297</td>
475
+ </tr>
476
+ </tbody>
477
+ </table>
478
+
479
  **SemViQA-QATC** plays a crucial role in the **SemViQA** system by enhancing accuracy in evidence extraction. When integrated into a pipeline, this model helps determine whether a claim is supported or refuted based on retrieved evidence.
480
 
481
  ## **Citation**