BioASQ Participants Area
Task 3b: Test Results of Phase B
The test results are presented in seperate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task B are presented
here .
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
HPI-S2 |
0.6667 |
- | - | - |
0.0327 |
0.0830 |
0.0424 |
auth-qa-1 |
0.8485 |
0.1154 |
0.1154 |
0.1154 |
- | - | - |
SNUMedinfo1 |
- |
- | - | - |
- | - | - |
SNUMedinfo2 |
- |
- | - | - |
- | - | - |
SNUMedinfo3 |
- |
- | - | - |
- | - | - |
SNUMedinfo4 |
- |
- | - | - |
- | - | - |
SNUMedinfo5 |
- |
- | - | - |
- | - | - |
ilsp.aueb.1 |
- |
- | - | - |
- | - | - |
ilsp.aueb.2 |
- |
- | - | - |
- | - | - |
fdu |
0.8485 |
0.1154 |
0.2692 |
0.1744 |
0.0958 |
0.4312 |
0.1520 |
fdu2 |
0.8485 |
0.1154 |
0.2692 |
0.1744 |
0.0958 |
0.4312 |
0.1520 |
fdu3 |
0.8485 |
0.1154 |
0.2692 |
0.1744 |
0.0931 |
0.4085 |
0.1472 |
fdu4 |
0.8485 |
0.1538 |
0.3077 |
0.2128 |
0.0681 |
0.4998 |
0.1166 |
main system |
0.8485 |
0.1923 |
0.3846 |
0.2641 |
0.1841 |
0.2597 |
0.1987 |
BioASQ_Baseline |
0.4545 |
- | - | - |
- | - | - |
BioASQ Baseline FS |
0.5455 |
- | - | - |
- | - | - |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
HPI-S2 |
0.1897 |
0.2015 |
3.87 |
3.46 |
3.38 |
4.15 |
auth-qa-1 |
- | - |
- |
- |
- |
- |
SNUMedinfo1 |
0.3413 |
0.3550 |
3.97 |
4.05 |
3.74 |
4.13 |
SNUMedinfo2 |
0.3459 |
0.3585 |
3.98 |
4.06 |
3.79 |
4.19 |
SNUMedinfo3 |
0.3386 |
0.3524 |
4.01 |
4.05 |
3.84 |
4.22 |
SNUMedinfo4 |
0.2942 |
0.3076 |
4.04 |
3.96 |
3.87 |
4.29 |
SNUMedinfo5 |
0.3139 |
0.3270 |
4.04 |
4.02 |
3.89 |
4.27 |
ilsp.aueb.1 |
0.3894 |
0.4144 |
3.10 |
3.99 |
2.68 |
3.14 |
ilsp.aueb.2 |
0.4183 |
0.4387 |
3.13 |
4.05 |
2.71 |
3.15 |
fdu |
0.2801 |
0.2811 |
3.76 |
3.68 |
3.64 |
3.98 |
fdu2 |
0.2932 |
0.3045 |
3.78 |
3.83 |
3.58 |
3.88 |
fdu3 |
0.2881 |
0.3076 |
3.70 |
3.94 |
3.46 |
3.95 |
fdu4 |
0.2881 |
0.3076 |
3.69 |
3.95 |
3.47 |
3.95 |
main system |
0.3076 |
0.3209 |
3.10 |
4.19 |
3.08 |
3.16 |
BioASQ_Baseline |
0.4053 |
0.4267 |
- |
- |
- |
- |
BioASQ Baseline FS |
0.3789 |
0.3985 |
- |
- |
- |
- |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
main system |
0.8125 |
0.1250 |
0.1875 |
0.1510 |
0.1149 |
0.1798 |
0.1356 |
system 2 |
0.8125 |
0.0938 |
0.1875 |
0.1260 |
0.1149 |
0.1798 |
0.1356 |
system 3 |
0.8125 |
0.0625 |
0.2500 |
0.1391 |
0.1149 |
0.1798 |
0.1356 |
auth-qa-1 |
0.8125 |
0.0313 |
0.0313 |
0.0313 |
0.0357 |
0.0089 |
0.0143 |
ilsp.aueb.1 |
- |
- | - | - |
- | - | - |
ilsp.aueb.2 |
- |
- | - | - |
- | - | - |
HPI-S2 |
0.5625 |
- | - | - |
0.0714 |
0.0161 |
0.0262 |
SNUMedinfo1 |
- |
- | - | - |
- | - | - |
SNUMedinfo2 |
- |
- | - | - |
- | - | - |
SNUMedinfo3 |
- |
- | - | - |
- | - | - |
SNUMedinfo4 |
- |
- | - | - |
- | - | - |
SNUMedinfo5 |
- |
- | - | - |
- | - | - |
fdu4 |
0.8125 |
0.0313 |
0.1563 |
0.0859 |
0.0828 |
0.3772 |
0.1280 |
fdu2 |
0.8125 |
0.0625 |
0.1875 |
0.1172 |
0.0828 |
0.3772 |
0.1280 |
fdu |
0.8125 |
0.0625 |
0.1875 |
0.1172 |
0.0828 |
0.3772 |
0.1280 |
fdu3 |
0.8125 |
0.0625 |
0.1875 |
0.1172 |
0.0829 |
0.3772 |
0.1281 |
fdu5 |
0.7500 |
0.0313 |
0.0313 |
0.0313 |
0.0007 |
0.0083 |
0.0013 |
qaiiit system 1 |
- |
- | - | - |
- | - | - |
BioASQ_Baseline |
0.3125 |
- | - | - |
0.0357 |
0.0089 |
0.0143 |
BioASQ Baseline FS |
0.4375 |
- | - | - |
0.0357 |
0.0089 |
0.0143 |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
main system |
0.2968 |
0.3055 |
3.43 |
4.57 |
3.46 |
3.36 |
system 2 |
0.2984 |
0.3069 |
3.45 |
4.63 |
3.44 |
3.31 |
system 3 |
0.2983 |
0.3080 |
3.42 |
4.64 |
3.43 |
3.39 |
auth-qa-1 |
- | - |
- |
- |
- |
- |
ilsp.aueb.1 |
0.3852 |
0.4160 |
3.28 |
4.45 |
2.79 |
3.43 |
ilsp.aueb.2 |
0.3989 |
0.4368 |
3.30 |
4.55 |
2.81 |
3.35 |
HPI-S2 |
0.2063 |
0.2243 |
3.73 |
3.90 |
3.75 |
4.51 |
SNUMedinfo1 |
0.3953 |
0.4114 |
4.05 |
4.60 |
4.06 |
4.15 |
SNUMedinfo2 |
0.3897 |
0.4073 |
4.07 |
4.64 |
4.08 |
4.17 |
SNUMedinfo3 |
0.3918 |
0.4087 |
4.09 |
4.63 |
4.09 |
4.20 |
SNUMedinfo4 |
0.3509 |
0.3708 |
4.13 |
4.58 |
4.13 |
4.37 |
SNUMedinfo5 |
0.3641 |
0.3848 |
4.13 |
4.59 |
4.11 |
4.33 |
fdu4 |
0.2604 |
0.2859 |
3.97 |
4.38 |
3.98 |
4.18 |
fdu2 |
0.2590 |
0.2853 |
3.96 |
4.39 |
3.97 |
4.19 |
fdu |
0.2590 |
0.2853 |
3.96 |
4.39 |
3.97 |
4.19 |
fdu3 |
0.2604 |
0.2859 |
3.96 |
4.39 |
3.97 |
4.19 |
fdu5 |
- | - |
- |
- |
- |
- |
qaiiit system 1 |
0.3081 |
0.3353 |
- |
- |
- |
- |
BioASQ_Baseline |
0.4570 |
0.4772 |
- |
- |
- |
- |
BioASQ Baseline FS |
0.4105 |
0.4388 |
- |
- |
- |
- |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
HPI-S2 |
0.6207 |
- | - | - |
- | - | - |
SNUMedinfo1 |
- |
- | - | - |
- | - | - |
SNUMedinfo2 |
- |
- | - | - |
- | - | - |
SNUMedinfo3 |
- |
- | - | - |
- | - | - |
SNUMedinfo4 |
- |
- | - | - |
- | - | - |
SNUMedinfo5 |
- |
- | - | - |
- | - | - |
main system |
0.9655 |
0.1923 |
0.2692 |
0.2308 |
0.1529 |
0.1765 |
0.1587 |
system 2 |
0.9655 |
0.0385 |
0.1538 |
0.0897 |
0.1529 |
0.1765 |
0.1587 |
system 3 |
0.9655 |
0.1538 |
0.2692 |
0.1987 |
0.1529 |
0.1765 |
0.1587 |
auth-qa-1 |
0.9655 |
0.0385 |
0.0769 |
0.0577 |
- | - | - |
ilsp.aueb.1 |
- |
- | - | - |
- | - | - |
ilsp.aueb.2 |
- |
- | - | - |
- | - | - |
ilsp.aueb.3 |
- |
- | - | - |
- | - | - |
fdu4 |
0.9655 |
0.0769 |
0.1154 |
0.0962 |
0.0175 |
0.2735 |
0.0324 |
oaqa-3b-3 |
0.0345 |
0.1538 |
0.3462 |
0.2321 |
0.0520 |
0.7383 |
0.0940 |
oaqa-3b-3-e |
0.0345 |
0.1538 |
0.3462 |
0.2321 |
0.0594 |
0.8130 |
0.1072 |
fdu2 |
0.6207 |
0.1154 |
0.1923 |
0.1359 |
0.0851 |
0.3964 |
0.1353 |
fdu3 |
0.6207 |
0.1154 |
0.1923 |
0.1359 |
0.0851 |
0.3964 |
0.1353 |
fdu5 |
0.9655 |
0.0385 |
0.0385 |
0.0385 |
0.0722 |
0.1775 |
0.0390 |
fdu |
0.6207 |
0.1154 |
0.1923 |
0.1359 |
0.1008 |
0.4993 |
0.1608 |
BioASQ_Baseline |
0.3448 |
- | - | - |
- | - | - |
BioASQ Baseline FS |
0.4828 |
- | - | - |
- | - | - |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
HPI-S2 |
0.2131 |
0.2360 |
3.84 |
3.89 |
3.95 |
4.54 |
SNUMedinfo1 |
0.4379 |
0.4468 |
4.16 |
4.55 |
4.25 |
4.44 |
SNUMedinfo2 |
0.4363 |
0.4448 |
4.14 |
4.54 |
4.23 |
4.44 |
SNUMedinfo3 |
0.4320 |
0.4407 |
4.15 |
4.53 |
4.24 |
4.46 |
SNUMedinfo4 |
0.4016 |
0.4128 |
4.17 |
4.50 |
4.27 |
4.54 |
SNUMedinfo5 |
0.4121 |
0.4238 |
4.17 |
4.52 |
4.25 |
4.51 |
main system |
0.3158 |
0.3389 |
3.43 |
4.60 |
3.57 |
3.54 |
system 2 |
0.3154 |
0.3396 |
3.42 |
4.60 |
3.57 |
3.51 |
system 3 |
0.3100 |
0.3341 |
3.43 |
4.60 |
3.55 |
3.54 |
auth-qa-1 |
- | - |
- |
- |
- |
- |
ilsp.aueb.1 |
0.4573 |
0.4863 |
3.40 |
4.57 |
3.15 |
3.53 |
ilsp.aueb.2 |
0.5165 |
0.5459 |
3.39 |
4.55 |
3.19 |
3.46 |
ilsp.aueb.3 |
0.4755 |
0.5072 |
3.38 |
4.50 |
3.17 |
3.45 |
fdu4 |
- | - |
- |
- |
- |
- |
oaqa-3b-3 |
- | - |
- |
- |
- |
- |
oaqa-3b-3-e |
- | - |
- |
- |
- |
- |
fdu2 |
0.3276 |
0.3467 |
4.00 |
4.43 |
4.20 |
4.32 |
fdu3 |
0.3276 |
0.3467 |
4.00 |
4.43 |
4.20 |
4.33 |
fdu5 |
- | - |
- |
- |
- |
- |
fdu |
0.3679 |
0.3933 |
3.98 |
4.43 |
3.95 |
4.28 |
BioASQ_Baseline |
0.4518 |
0.4775 |
- |
- |
- |
- |
BioASQ Baseline FS |
0.4772 |
0.5005 |
- |
- |
- |
- |
Test batch 4
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
SNUMedinfo1 |
- |
- | - | - |
- | - | - |
SNUMedinfo2 |
- |
- | - | - |
- | - | - |
SNUMedinfo3 |
- |
- | - | - |
- | - | - |
SNUMedinfo4 |
- |
- | - | - |
- | - | - |
SNUMedinfo5 |
- |
- | - | - |
- | - | - |
HPI-S2 |
0.5600 |
0.0345 |
0.0345 |
0.0345 |
0.1522 |
0.0473 |
0.0689 |
main system |
0.9600 |
0.2414 |
0.4828 |
0.3374 |
0.1391 |
0.1453 |
0.1335 |
system 2 |
0.9600 |
0.2069 |
0.4483 |
0.3029 |
0.1391 |
0.1453 |
0.1335 |
system 3 |
0.9600 |
0.2414 |
0.3793 |
0.2897 |
0.1391 |
0.1453 |
0.1335 |
ilsp.aueb.1 |
- |
- | - | - |
- | - | - |
ilsp.aueb.2 |
- |
- | - | - |
- | - | - |
ilsp.aueb.3 |
- |
- | - | - |
- | - | - |
oaqa-3b-4 |
0.9600 |
0.4138 |
0.6552 |
0.5098 |
0.3836 |
0.3450 |
0.3155 |
fdu4 |
0.8800 |
0.1724 |
0.3448 |
0.2471 |
0.1446 |
0.5475 |
0.2189 |
fdu |
0.8800 |
0.1724 |
0.3448 |
0.2471 |
0.1446 |
0.5475 |
0.2189 |
fdu2 |
0.9600 |
0.1724 |
0.3448 |
0.2471 |
0.1446 |
0.5475 |
0.2189 |
fdu3 |
0.9600 |
0.1724 |
0.3448 |
0.2471 |
0.1446 |
0.5475 |
0.2189 |
auth-qa-1 |
0.9600 |
0.0690 |
0.2069 |
0.1236 |
- | - | - |
BioASQ_Baseline |
0.3600 |
- | - | - |
- | - | - |
BioASQ Baseline FS |
0.4000 |
- | - | - |
- | - | - |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
SNUMedinfo1 |
0.4018 |
0.4107 |
4.19 |
4.47 |
4.26 |
4.51 |
SNUMedinfo2 |
0.4250 |
0.4323 |
4.18 |
4.51 |
4.27 |
4.43 |
SNUMedinfo3 |
0.3969 |
0.4036 |
4.16 |
4.49 |
4.29 |
4.47 |
SNUMedinfo4 |
0.3747 |
0.3816 |
4.18 |
4.45 |
4.30 |
4.53 |
SNUMedinfo5 |
0.3879 |
0.3948 |
4.17 |
4.44 |
4.29 |
4.52 |
HPI-S2 |
0.2672 |
0.2872 |
3.95 |
3.96 |
3.95 |
4.44 |
main system |
0.3430 |
0.3545 |
3.55 |
4.60 |
3.63 |
3.52 |
system 2 |
0.3387 |
0.3509 |
3.52 |
4.60 |
3.60 |
3.46 |
system 3 |
0.3380 |
0.3490 |
3.52 |
4.60 |
3.58 |
3.39 |
ilsp.aueb.1 |
0.4416 |
0.4631 |
3.44 |
4.54 |
3.24 |
3.37 |
ilsp.aueb.2 |
0.4672 |
0.4927 |
3.45 |
4.58 |
3.21 |
3.39 |
ilsp.aueb.3 |
0.4441 |
0.4699 |
3.44 |
4.56 |
3.17 |
3.36 |
oaqa-3b-4 |
- | - |
- |
- |
- |
- |
fdu4 |
0.3260 |
0.3338 |
4.05 |
4.47 |
4.19 |
4.26 |
fdu |
0.2490 |
0.2588 |
4.05 |
4.38 |
4.43 |
4.22 |
fdu2 |
0.2490 |
0.2588 |
4.05 |
4.38 |
4.43 |
4.22 |
fdu3 |
0.3260 |
0.3338 |
4.05 |
4.47 |
4.19 |
4.26 |
auth-qa-1 |
- | - |
- |
- |
- |
- |
BioASQ_Baseline |
0.4672 |
0.4891 |
- |
- |
- |
- |
BioASQ Baseline FS |
0.4322 |
0.4495 |
- |
- |
- |
- |
Test batch 5
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
auth-qa-1 |
0.6786 |
0.1364 |
0.1818 |
0.1455 |
- | - | - |
main system |
0.6786 |
0.0455 |
0.1364 |
0.0720 |
0.1139 |
0.1757 |
0.1293 |
system 2 |
0.6786 |
0.0455 |
0.2727 |
0.1303 |
0.1139 |
0.1757 |
0.1293 |
system 3 |
0.6786 |
0.0909 |
0.1364 |
0.1061 |
0.1139 |
0.1757 |
0.1293 |
ilsp.aueb.1 |
- |
- | - | - |
- | - | - |
ilsp.aueb.2 |
- |
- | - | - |
- | - | - |
ilsp.aueb.3 |
- |
- | - | - |
- | - | - |
SNUMedinfo1 |
- |
- | - | - |
- | - | - |
SNUMedinfo2 |
- |
- | - | - |
- | - | - |
SNUMedinfo3 |
- |
- | - | - |
- | - | - |
SNUMedinfo4 |
- |
- | - | - |
- | - | - |
SNUMedinfo5 |
- |
- | - | - |
- | - | - |
fdu |
0.7143 |
0.2273 |
0.2727 |
0.2500 |
0.0914 |
0.4146 |
0.1376 |
fdu2 |
0.6786 |
0.2273 |
0.2727 |
0.2500 |
0.0935 |
0.4146 |
0.1404 |
fdu3 |
0.6786 |
0.2273 |
0.2727 |
0.2500 |
0.0914 |
0.4146 |
0.1376 |
fdu4 |
0.7143 |
0.2273 |
0.2727 |
0.2500 |
0.0914 |
0.4146 |
0.1376 |
HPI-S2 |
0.3571 |
0.0909 |
0.0909 |
0.0909 |
0.0625 |
0.0292 |
0.0397 |
oaqa-3b-4 |
0.6786 |
0.2273 |
0.3182 |
0.2727 |
0.1704 |
0.1139 |
0.1296 |
oaqa-3b-5 |
0.6786 |
0.2273 |
0.3182 |
0.2727 |
0.1643 |
0.2538 |
0.1860 |
YodaQA_base |
0.6786 |
0.1818 |
0.2273 |
0.2045 |
0.1514 |
0.2132 |
0.1580 |
BioASQ_Baseline |
0.7143 |
0.0455 |
0.0455 |
0.0455 |
- | - | - |
BioASQ Baseline FS |
0.6429 |
0.0455 |
0.0455 |
0.0455 |
- | - | - |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
auth-qa-1 |
- | - |
- |
- |
- |
- |
main system |
0.3880 |
0.4000 |
3.30 |
4.27 |
3.55 |
3.59 |
system 2 |
0.3914 |
0.4032 |
3.28 |
4.30 |
3.55 |
3.56 |
system 3 |
0.3901 |
0.3980 |
3.23 |
4.19 |
3.46 |
3.53 |
ilsp.aueb.1 |
0.3959 |
0.4256 |
3.17 |
3.89 |
2.35 |
3.45 |
ilsp.aueb.2 |
0.3959 |
0.4256 |
3.17 |
3.87 |
2.40 |
3.44 |
ilsp.aueb.3 |
0.3705 |
0.4032 |
3.10 |
3.87 |
2.30 |
3.54 |
SNUMedinfo1 |
0.3825 |
0.3899 |
4.18 |
4.12 |
4.17 |
4.61 |
SNUMedinfo2 |
0.3971 |
0.4033 |
4.19 |
4.17 |
4.16 |
4.52 |
SNUMedinfo3 |
0.3780 |
0.3841 |
4.22 |
4.10 |
4.17 |
4.61 |
SNUMedinfo4 |
0.3488 |
0.3559 |
4.23 |
3.95 |
4.21 |
4.69 |
SNUMedinfo5 |
0.3567 |
0.3639 |
4.26 |
4.03 |
4.19 |
4.67 |
fdu |
0.2261 |
0.2309 |
3.40 |
3.49 |
4.12 |
4.31 |
fdu2 |
0.2261 |
0.2309 |
3.39 |
3.49 |
4.13 |
4.31 |
fdu3 |
0.3172 |
0.3316 |
3.61 |
3.91 |
4.02 |
4.35 |
fdu4 |
0.3172 |
0.3316 |
3.61 |
3.90 |
4.03 |
4.36 |
HPI-S2 |
0.1629 |
0.1711 |
4.22 |
2.82 |
3.77 |
4.67 |
oaqa-3b-4 |
- | - |
- |
- |
- |
- |
oaqa-3b-5 |
- | - |
- |
- |
- |
- |
YodaQA_base |
- | - |
- |
- |
- |
- |
BioASQ_Baseline |
0.4038 |
0.4313 |
- |
- |
- |
- |
BioASQ Baseline FS |
0.3540 |
0.3862 |
- |
- |
- |
- |