BioASQ Participants Area
Task 4b: Test Results of Phase B
The test results are presented in seperate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task B are presented
here .
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
auth-qa-1 |
0.9643 |
0.1026 |
0.1538 |
0.1218 |
0.1364 |
0.2636 |
0.1677 |
WS4A |
0.1071 |
0.0256 |
0.0256 |
0.0256 |
0.0844 |
0.1591 |
0.1090 |
HPI-S2 |
0.2500 |
- | - | - |
- | - | - |
Ideal Answers
|
Automatic scores |
Manual scores |
System |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
auth-qa-1 |
- | - |
- |
- |
- |
- |
WS4A |
0.0535 |
0.0526 |
3.12 |
1.89 |
2.26 |
4.19 |
HPI-S2 |
0.2354 |
0.2544 |
3.71 |
3.05 |
2.59 |
4.10 |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
WS4A |
0.0938 |
- | - | - |
0.0985 |
0.1425 |
0.1023 |
auth-qa-1 |
0.9063 |
0.0645 |
0.1613 |
0.1075 |
0.1429 |
0.1155 |
0.1213 |
Lab Zhu,Fudan Univer |
0.9063 |
0.1935 |
0.2258 |
0.2097 |
0.1106 |
0.3492 |
0.1550 |
LabZhu,FDU |
0.9063 |
0.1935 |
0.2258 |
0.2097 |
0.1031 |
0.3492 |
0.1444 |
HPI-S2 |
- |
- | - | - |
- | - | - |
LabZhu_FDU |
0.0938 |
0.1935 |
0.2258 |
0.2097 |
0.1031 |
0.3492 |
0.1444 |
Lab Zhu ,Fdan Univer |
0.9063 |
0.1935 |
0.2258 |
0.2097 |
0.1052 |
0.3492 |
0.1474 |
LabZhu-FDU |
0.9063 |
0.1935 |
0.2581 |
0.2258 |
0.1038 |
0.3492 |
0.1454 |
Ideal Answers
|
Automatic scores |
Manual scores |
System |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
WS4A |
0.0322 |
0.0300 |
2.73 |
1.68 |
2.44 |
3.94 |
auth-qa-1 |
- | - |
- |
- |
- |
- |
Lab Zhu,Fudan Univer |
0.3354 |
0.3384 |
4.40 |
3.87 |
4.10 |
4.89 |
LabZhu,FDU |
0.3354 |
0.3384 |
4.40 |
3.87 |
4.10 |
4.89 |
HPI-S2 |
0.2583 |
0.2910 |
3.83 |
3.43 |
2.84 |
4.11 |
LabZhu_FDU |
0.3354 |
0.3384 |
4.40 |
3.87 |
4.10 |
4.89 |
Lab Zhu ,Fdan Univer |
0.3354 |
0.3384 |
4.40 |
3.87 |
4.10 |
4.89 |
LabZhu-FDU |
0.3354 |
0.3384 |
4.40 |
3.87 |
4.10 |
4.89 |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
auth-qa-1 |
0.9600 |
0.1154 |
0.1923 |
0.1442 |
0.2500 |
0.2873 |
0.2580 |
Lab Zhu ,Fdan Univer |
0.9600 |
0.1923 |
0.2692 |
0.2192 |
0.1450 |
0.5929 |
0.2181 |
LabZhu,FDU |
0.9600 |
0.1923 |
0.2692 |
0.2192 |
0.1444 |
0.6214 |
0.2176 |
oaqa-3b-3 |
0.5200 |
0.2308 |
0.2692 |
0.2436 |
0.5396 |
0.5008 |
0.4828 |
LabZhu_FDU |
0.9600 |
0.1923 |
0.2692 |
0.2192 |
0.1420 |
0.5929 |
0.2132 |
LabZhu-FDU |
0.0400 |
0.1923 |
0.2692 |
0.2192 |
0.1420 |
0.5929 |
0.2132 |
Lab Zhu,Fudan Univer |
0.9600 |
0.1923 |
0.2692 |
0.2192 |
0.1455 |
0.5770 |
0.2185 |
HPI-S2 |
- |
- | - | - |
- | - | - |
WS4A |
0.2400 |
0.0385 |
0.0385 |
0.0385 |
0.1172 |
0.2817 |
0.1609 |
Ideal Answers
|
Automatic scores |
Manual scores |
System |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
auth-qa-1 |
- | - |
- |
- |
- |
- |
Lab Zhu ,Fdan Univer |
0.3312 |
0.3420 |
4.19 |
3.90 |
3.97 |
4.75 |
LabZhu,FDU |
0.3312 |
0.3420 |
4.19 |
3.90 |
3.97 |
4.75 |
oaqa-3b-3 |
- | - |
- |
- |
- |
- |
LabZhu_FDU |
0.3312 |
0.3420 |
4.19 |
3.90 |
3.97 |
4.75 |
LabZhu-FDU |
0.3312 |
0.3420 |
4.19 |
3.90 |
3.97 |
4.75 |
Lab Zhu,Fudan Univer |
0.3312 |
0.3420 |
4.19 |
3.90 |
3.97 |
4.75 |
HPI-S2 |
0.2903 |
0.3146 |
3.49 |
3.39 |
2.89 |
3.82 |
WS4A |
0.0874 |
0.0857 |
2.70 |
2.25 |
2.69 |
3.78 |
Test batch 4
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
Lab Zhu ,Fdan Univer |
0.9524 |
0.0968 |
0.1935 |
0.1371 |
0.1970 |
0.2258 |
0.1711 |
Lab Zhu,Fudan Univer |
0.9524 |
0.0968 |
0.1935 |
0.1371 |
0.2169 |
0.2925 |
0.1986 |
HPI-S2 |
- |
- | - | - |
- | - | - |
oaqa-3b-4 |
0.6667 |
0.2903 |
0.3871 |
0.3253 |
0.2478 |
0.5494 |
0.3115 |
LabZhu,FDU |
0.9524 |
0.0968 |
0.1935 |
0.1371 |
0.2095 |
0.2925 |
0.1908 |
auth-qa-1 |
0.9524 |
0.0323 |
0.1935 |
0.0806 |
0.0333 |
0.0444 |
0.0381 |
Ideal Answers
|
Automatic scores |
Manual scores |
System |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
Lab Zhu ,Fdan Univer |
0.3363 |
0.3308 |
4.12 |
3.83 |
4.22 |
4.94 |
Lab Zhu,Fudan Univer |
0.3363 |
0.3308 |
4.12 |
3.83 |
4.22 |
4.94 |
HPI-S2 |
0.2525 |
0.2701 |
3.84 |
3.18 |
2.86 |
4.09 |
oaqa-3b-4 |
- | - |
- |
- |
- |
- |
LabZhu,FDU |
0.3363 |
0.3308 |
4.12 |
3.83 |
4.22 |
4.94 |
auth-qa-1 |
- | - |
- |
- |
- |
- |
Test batch 5
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
auth-qa-1 |
1.0000 |
0.0606 |
0.1212 |
0.0833 |
0.0875 |
0.0830 |
0.0835 |
Lab Zhu ,Fdan Univer |
1.0000 |
0.1818 |
0.3333 |
0.2475 |
0.0938 |
0.3479 |
0.1417 |
Lab Zhu,Fudan Univer |
1.0000 |
0.2121 |
0.3030 |
0.2475 |
0.1596 |
0.2188 |
0.1493 |
HPI-S2 |
- |
- | - | - |
- | - | - |
oaqa-3b-5 |
0.7407 |
0.2121 |
0.3939 |
0.2854 |
0.2662 |
0.4170 |
0.2897 |
oaqa-3b-5-e |
1.0000 |
0.2121 |
0.3939 |
0.2854 |
0.2165 |
0.3754 |
0.2597 |
LabZhu,FDU |
1.0000 |
0.2121 |
0.3030 |
0.2475 |
0.1636 |
0.2111 |
0.1507 |
LabZhu_FDU |
1.0000 |
0.1818 |
0.3333 |
0.2475 |
0.1036 |
0.3302 |
0.1502 |
LabZhu-FDU |
1.0000 |
0.1818 |
0.3333 |
0.2475 |
0.0854 |
0.3579 |
0.1328 |
WS4A |
0.2593 |
- | - | - |
0.0589 |
0.0698 |
0.0560 |
Ideal Answers
|
Automatic scores |
Manual scores |
System |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
auth-qa-1 |
- | - |
- |
- |
- |
- |
Lab Zhu ,Fdan Univer |
0.4762 |
0.4711 |
4.45 |
4.34 |
4.27 |
4.71 |
Lab Zhu,Fudan Univer |
0.4762 |
0.4711 |
4.45 |
4.34 |
4.27 |
4.71 |
HPI-S2 |
0.3541 |
0.3799 |
3.72 |
3.63 |
3.27 |
4.05 |
oaqa-3b-5 |
- | - |
- |
- |
- |
- |
oaqa-3b-5-e |
- | - |
- |
- |
- |
- |
LabZhu,FDU |
0.4762 |
0.4711 |
4.45 |
4.34 |
4.27 |
4.71 |
LabZhu_FDU |
0.4762 |
0.4711 |
4.45 |
4.34 |
4.27 |
4.71 |
LabZhu-FDU |
0.4762 |
0.4711 |
4.45 |
4.34 |
4.27 |
4.71 |
WS4A |
0.0625 |
0.0583 |
2.95 |
1.76 |
2.27 |
3.79 |