BioASQ Participants Area
Task 12b: Test Results of Phase A+
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task A+ are presented
here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
mibi_rag_snippet |
0.7600 |
0.8000 |
0.7000 |
0.7500 |
0.0476 |
0.0476 |
0.0476 |
0.2635 |
0.2106 |
0.2180 |
mibi_rag_abstract |
0.7200 |
0.7742 |
0.6316 |
0.7029 |
0.0476 |
0.0476 |
0.0476 |
0.3032 |
0.2673 |
0.2706 |
UR-IW-5 |
0.8000 |
0.8148 |
0.7826 |
0.7987 |
0.0952 |
0.0952 |
0.0952 |
0.4119 |
0.4182 |
0.3976 |
Fleming-1 |
0.8000 |
0.8387 |
0.7368 |
0.7878 |
- | - | - |
0.2186 |
0.2103 |
0.2079 |
GTBioASQsys2 |
0.8000 |
0.8148 |
0.7826 |
0.7987 |
0.0952 |
0.0952 |
0.0952 |
0.2722 |
0.2356 |
0.2350 |
Gatech competition |
0.8400 |
0.8333 |
0.8462 |
0.8397 |
0.1429 |
0.1429 |
0.1429 |
0.4452 |
0.3415 |
0.3661 |
GTBioASQsys3 |
0.8400 |
0.8462 |
0.8333 |
0.8397 |
0.1429 |
0.1429 |
0.1429 |
0.2421 |
0.1765 |
0.1866 |
UR-IW-4 |
0.8400 |
0.8462 |
0.8333 |
0.8397 |
0.0476 |
0.0952 |
0.0714 |
0.3948 |
0.4063 |
0.3798 |
UR-IW-2 |
0.8400 |
0.8462 |
0.8333 |
0.8397 |
0.0952 |
0.0952 |
0.0952 |
0.5250 |
0.4914 |
0.4808 |
bioinfo-0 |
0.5600 |
0.7179 |
- |
0.3590 |
- | - | - |
- | - | - |
UR-IW-3 |
0.9200 |
0.9333 |
0.9000 |
0.9167 |
0.0952 |
0.0952 |
0.0952 |
0.4016 |
0.4778 |
0.4089 |
UR-IW-1 |
0.8000 |
0.8276 |
0.7619 |
0.7947 |
0.1905 |
0.2381 |
0.2143 |
0.3224 |
0.4273 |
0.3418 |
bioinfo-1 |
0.5600 |
0.7179 |
- |
0.3590 |
- | - | - |
- | - | - |
bioinfo-2 |
0.5600 |
0.7179 |
- |
0.3590 |
- | - | - |
- | - | - |
bioinfo-3 |
0.5600 |
0.7179 |
- |
0.3590 |
- | - | - |
- | - | - |
bioinfo-4 |
0.5600 |
0.7179 |
- |
0.3590 |
- | - | - |
- | - | - |
dmiip2024 |
0.7600 |
0.8235 |
0.6250 |
0.7243 |
0.1905 |
0.1905 |
0.1905 |
0.4960 |
0.4269 |
0.4471 |
dmiip2024_1 |
0.7600 |
0.8235 |
0.6250 |
0.7243 |
0.2381 |
0.5238 |
0.3349 |
0.3317 |
0.3591 |
0.3109 |
dmiip2024_3 |
0.7600 |
0.8235 |
0.6250 |
0.7243 |
0.2381 |
0.5238 |
0.3611 |
0.3341 |
0.3746 |
0.3315 |
dmiip2024_2 |
0.7600 |
0.8235 |
0.6250 |
0.7243 |
0.2381 |
0.3810 |
0.2730 |
0.3270 |
0.3591 |
0.2993 |
dmiip2024_4 |
0.7600 |
0.8235 |
0.6250 |
0.7243 |
0.0952 |
0.2857 |
0.1683 |
0.2651 |
0.2810 |
0.2540 |
simple truncation |
0.8000 |
0.8485 |
0.7059 |
0.7772 |
0.0952 |
0.1429 |
0.1190 |
0.1733 |
0.2046 |
0.1760 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
mibi_rag_snippet |
0.2539 |
0.1447 |
0.2784 |
0.1429 |
- |
- |
- |
- |
mibi_rag_abstract |
0.2584 |
0.1586 |
0.2807 |
0.1572 |
- |
- |
- |
- |
UR-IW-5 |
0.2280 |
0.1065 |
0.2557 |
0.1073 |
- |
- |
- |
- |
Fleming-1 |
0.2158 |
0.0704 |
0.2552 |
0.0747 |
- |
- |
- |
- |
GTBioASQsys2 |
0.1655 |
0.1078 |
0.1861 |
0.1091 |
- |
- |
- |
- |
Gatech competition |
0.1474 |
0.1080 |
0.1624 |
0.1112 |
- |
- |
- |
- |
GTBioASQsys3 |
0.1464 |
0.1020 |
0.1778 |
0.1042 |
- |
- |
- |
- |
UR-IW-4 |
0.2516 |
0.1201 |
0.2752 |
0.1193 |
- |
- |
- |
- |
UR-IW-2 |
0.2356 |
0.2345 |
0.2428 |
0.2308 |
- |
- |
- |
- |
bioinfo-0 |
0.2962 |
0.0671 |
0.3321 |
0.0679 |
- |
- |
- |
- |
UR-IW-3 |
0.2338 |
0.2098 |
0.2505 |
0.2044 |
- |
- |
- |
- |
UR-IW-1 |
0.2393 |
0.1254 |
0.2642 |
0.1228 |
- |
- |
- |
- |
bioinfo-1 |
0.2939 |
0.0735 |
0.3343 |
0.0746 |
- |
- |
- |
- |
bioinfo-2 |
0.2756 |
0.0736 |
0.3139 |
0.0774 |
- |
- |
- |
- |
bioinfo-3 |
0.3007 |
0.0721 |
0.3336 |
0.0730 |
- |
- |
- |
- |
bioinfo-4 |
0.2861 |
0.0657 |
0.3324 |
0.0693 |
- |
- |
- |
- |
dmiip2024 |
0.1674 |
0.1485 |
0.1818 |
0.1498 |
- |
- |
- |
- |
dmiip2024_1 |
0.1674 |
0.1485 |
0.1818 |
0.1498 |
- |
- |
- |
- |
dmiip2024_3 |
0.1674 |
0.1485 |
0.1818 |
0.1498 |
- |
- |
- |
- |
dmiip2024_2 |
0.1674 |
0.1485 |
0.1818 |
0.1498 |
- |
- |
- |
- |
dmiip2024_4 |
0.1674 |
0.1485 |
0.1818 |
0.1498 |
- |
- |
- |
- |
simple truncation |
0.0938 |
0.0723 |
0.1064 |
0.0726 |
- |
- |
- |
- |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
mibi_rag_snippet |
0.7692 |
0.8125 |
0.7000 |
0.7563 |
0.1579 |
0.1579 |
0.1579 |
0.3306 |
0.2830 |
0.3005 |
mibi_rag_abstract |
0.6923 |
0.7500 |
0.6000 |
0.6750 |
0.1053 |
0.1053 |
0.1053 |
0.2769 |
0.1967 |
0.2140 |
Gatech competition |
0.8077 |
0.8276 |
0.7826 |
0.8051 |
0.2105 |
0.2105 |
0.2105 |
0.2290 |
0.2310 |
0.2133 |
GTBioASQsys2 |
0.6923 |
0.7143 |
0.6667 |
0.6905 |
0.2105 |
0.2105 |
0.2105 |
0.1549 |
0.1260 |
0.1268 |
GTBioASQsys3 |
0.8077 |
0.8387 |
0.7619 |
0.8003 |
0.2105 |
0.2105 |
0.2105 |
0.1376 |
0.1481 |
0.1364 |
UR-IW-5 |
0.8846 |
0.8966 |
0.8696 |
0.8831 |
0.3158 |
0.3158 |
0.3158 |
0.1589 |
0.1725 |
0.1497 |
UR-IW-4 |
0.8462 |
0.8571 |
0.8333 |
0.8452 |
0.1579 |
0.2105 |
0.1842 |
0.2628 |
0.2299 |
0.2179 |
UR-IW-3 |
0.8846 |
0.8966 |
0.8696 |
0.8831 |
0.3158 |
0.3158 |
0.3158 |
0.2625 |
0.2400 |
0.2411 |
UR-IW-2 |
0.8462 |
0.8571 |
0.8333 |
0.8452 |
0.2632 |
0.3158 |
0.2895 |
0.2045 |
0.2569 |
0.2182 |
simple truncation |
0.7692 |
0.8235 |
0.6667 |
0.7451 |
0.1579 |
0.1579 |
0.1579 |
0.0773 |
0.0662 |
0.0675 |
kmeans |
0.7692 |
0.8125 |
0.7000 |
0.7563 |
0.1579 |
0.1579 |
0.1579 |
0.0930 |
0.0935 |
0.0894 |
similarity measures |
0.7692 |
0.8235 |
0.6667 |
0.7451 |
0.1579 |
0.1579 |
0.1579 |
0.0773 |
0.0662 |
0.0675 |
UR-IW-1 |
0.7692 |
0.8000 |
0.7273 |
0.7636 |
0.2632 |
0.2632 |
0.2632 |
0.1953 |
0.1906 |
0.1766 |
Fleming-3 |
0.8077 |
0.8485 |
0.7368 |
0.7927 |
0.2632 |
0.3684 |
0.3070 |
0.2335 |
0.1478 |
0.1708 |
dmiip2024 |
0.9615 |
0.9677 |
0.9524 |
0.9601 |
0.2632 |
0.4737 |
0.3596 |
0.4299 |
0.4543 |
0.4074 |
dmiip2024_1 |
0.8077 |
0.8387 |
0.7619 |
0.8003 |
0.2632 |
0.3684 |
0.3158 |
0.4470 |
0.4451 |
0.4088 |
dmiip2024_2 |
0.8846 |
0.9143 |
0.8235 |
0.8689 |
0.1579 |
0.3158 |
0.2237 |
0.3520 |
0.3935 |
0.3230 |
dmiip2024_4 |
0.3846 |
- |
0.5556 |
0.2778 |
0.1579 |
0.4211 |
0.2807 |
0.2685 |
0.3199 |
0.2606 |
dmiip2024_3 |
0.8846 |
0.9143 |
0.8235 |
0.8689 |
0.3684 |
0.4211 |
0.3947 |
0.3769 |
0.3793 |
0.3542 |
bioinfo-0 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
bioinfo-1 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
bioinfo-2 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
bioinfo-3 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
bioinfo-4 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
CPS |
0.6923 |
0.7895 |
0.4286 |
0.6090 |
- | - | - |
- | - | - |
CPS2 |
0.6923 |
0.7778 |
0.5000 |
0.6389 |
- | - | - |
- | - | - |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
mibi_rag_snippet |
0.2181 |
0.1199 |
0.2345 |
0.1200 |
- |
- |
- |
- |
mibi_rag_abstract |
0.2098 |
0.1095 |
0.2394 |
0.1127 |
- |
- |
- |
- |
Gatech competition |
0.1225 |
0.0879 |
0.1339 |
0.0925 |
- |
- |
- |
- |
GTBioASQsys2 |
0.1459 |
0.1018 |
0.1703 |
0.1101 |
- |
- |
- |
- |
GTBioASQsys3 |
0.1148 |
0.0946 |
0.1342 |
0.1002 |
- |
- |
- |
- |
UR-IW-5 |
0.2234 |
0.1002 |
0.2450 |
0.1024 |
- |
- |
- |
- |
UR-IW-4 |
0.2181 |
0.1066 |
0.2514 |
0.1108 |
- |
- |
- |
- |
UR-IW-3 |
0.1980 |
0.1785 |
0.2132 |
0.1815 |
- |
- |
- |
- |
UR-IW-2 |
0.1890 |
0.1747 |
0.1989 |
0.1783 |
- |
- |
- |
- |
simple truncation |
0.0525 |
0.0166 |
0.0591 |
0.0172 |
- |
- |
- |
- |
kmeans |
0.0850 |
0.0352 |
0.0986 |
0.0382 |
- |
- |
- |
- |
similarity measures |
0.0525 |
0.0166 |
0.0591 |
0.0172 |
- |
- |
- |
- |
UR-IW-1 |
0.1892 |
0.1033 |
0.2295 |
0.1076 |
- |
- |
- |
- |
Fleming-3 |
0.2033 |
0.0682 |
0.2411 |
0.0737 |
- |
- |
- |
- |
dmiip2024 |
0.1741 |
0.1604 |
0.1831 |
0.1641 |
- |
- |
- |
- |
dmiip2024_1 |
0.1724 |
0.1548 |
0.1818 |
0.1596 |
- |
- |
- |
- |
dmiip2024_2 |
0.1809 |
0.1661 |
0.1969 |
0.1680 |
- |
- |
- |
- |
dmiip2024_4 |
0.1594 |
0.1485 |
0.1745 |
0.1531 |
- |
- |
- |
- |
dmiip2024_3 |
0.1497 |
0.1265 |
0.1710 |
0.1374 |
- |
- |
- |
- |
bioinfo-0 |
0.2459 |
0.0762 |
0.2885 |
0.0815 |
- |
- |
- |
- |
bioinfo-1 |
0.2500 |
0.0771 |
0.2898 |
0.0806 |
- |
- |
- |
- |
bioinfo-2 |
0.2498 |
0.0721 |
0.2927 |
0.0778 |
- |
- |
- |
- |
bioinfo-3 |
0.2187 |
0.1898 |
0.2292 |
0.1898 |
- |
- |
- |
- |
bioinfo-4 |
0.1608 |
0.1373 |
0.1706 |
0.1381 |
- |
- |
- |
- |
CPS |
0.1729 |
0.1235 |
0.1949 |
0.1272 |
- |
- |
- |
- |
CPS2 |
0.1569 |
0.0861 |
0.1737 |
0.0904 |
- |
- |
- |
- |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
Test batch 4
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |