BioASQ Participants Area
Task 11b: Test Results of Phase B
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task B are presented
here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
OWLMan-phaseB-TaskV1 |
0.6250 |
0.7273 |
0.4000 |
0.5636 |
0.3684 |
0.5789 |
0.4649 |
0.2333 |
0.4152 |
0.2789 |
IISR-1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.4211 |
0.4211 |
0.4211 |
0.7602 |
0.6773 |
0.7043 |
IISR-3 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
Fleming-4 |
0.7500 |
0.8333 |
0.5000 |
0.6667 |
- | - | - |
- | - | - |
AsqAway_1 |
0.6250 |
0.7692 |
- |
0.3846 |
0.3684 |
0.3684 |
0.3684 |
0.4873 |
0.5083 |
0.4723 |
AsqAway_2 |
0.6250 |
0.7692 |
- |
0.3846 |
0.3684 |
0.5263 |
0.4474 |
0.4535 |
0.6332 |
0.4983 |
AsqAway_3 |
0.6250 |
0.7692 |
- |
0.3846 |
0.3684 |
0.5263 |
0.4474 |
0.4480 |
0.7131 |
0.5225 |
AsqAway_4 |
0.6250 |
0.7692 |
- |
0.3846 |
0.4211 |
0.5263 |
0.4737 |
0.4480 |
0.7131 |
0.5225 |
MQ-1 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
MQ-2 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
MQ-3 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
Deep ML methods for |
0.6250 |
0.7273 |
0.4000 |
0.5636 |
0.2105 |
0.2632 |
0.2281 |
0.4444 |
0.2680 |
0.2880 |
MQ-4 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
MQ-5 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
simple truncation |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2632 |
0.2632 |
0.2632 |
0.2806 |
0.2521 |
0.2601 |
dmiip1 |
0.8333 |
0.8750 |
0.7500 |
0.8125 |
0.4211 |
0.5789 |
0.5000 |
0.5016 |
0.4554 |
0.4347 |
dmiip2 |
0.8333 |
0.8750 |
0.7500 |
0.8125 |
0.5789 |
0.6842 |
0.6096 |
0.5532 |
0.4422 |
0.4462 |
dmiip4 |
0.8750 |
0.9032 |
0.8235 |
0.8634 |
0.4211 |
0.6842 |
0.5263 |
0.5549 |
0.5429 |
0.4932 |
dmiip5 |
0.8333 |
0.8750 |
0.7500 |
0.8125 |
0.4211 |
0.6316 |
0.5000 |
0.5232 |
0.5214 |
0.4809 |
dmiip3 |
0.8750 |
0.9032 |
0.8235 |
0.8634 |
0.4211 |
0.6316 |
0.5000 |
0.4590 |
0.5324 |
0.4701 |
IISR-2 |
0.9583 |
0.9677 |
0.9412 |
0.9545 |
0.5789 |
0.5789 |
0.5789 |
0.6833 |
0.5864 |
0.6210 |
UR-IW-2 |
0.9167 |
0.9375 |
0.8750 |
0.9063 |
0.6316 |
0.6316 |
0.6316 |
0.6750 |
0.6697 |
0.6703 |
UR-IW-3 |
0.9583 |
0.9677 |
0.9412 |
0.9545 |
0.5263 |
0.6316 |
0.5789 |
0.7159 |
0.7356 |
0.7130 |
UR-IW-4 |
0.8333 |
0.8750 |
0.7500 |
0.8125 |
0.2105 |
0.3158 |
0.2632 |
0.4278 |
0.4164 |
0.4143 |
extractive |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2632 |
0.2632 |
0.2632 |
0.2083 |
0.1833 |
0.1905 |
abstractive |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2105 |
0.2105 |
0.2105 |
0.2083 |
0.2000 |
0.2037 |
bioinfo-0 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
BART |
0.3333 |
0.3333 |
0.3333 |
0.3333 |
- | - | - |
- | - | - |
DMIS-KU-2 |
0.9167 |
0.9375 |
0.8750 |
0.9063 |
0.5263 |
0.6842 |
0.6053 |
0.7782 |
0.7032 |
0.7158 |
DMIS-KU-3 |
0.9167 |
0.9375 |
0.8750 |
0.9063 |
0.4737 |
0.6842 |
0.5614 |
0.8278 |
0.6790 |
0.7219 |
DMIS-KU-5 |
0.9583 |
0.9677 |
0.9412 |
0.9545 |
0.5263 |
0.6842 |
0.6053 |
0.8278 |
0.6790 |
0.7219 |
DMIS-KU-1 |
0.9167 |
0.9375 |
0.8750 |
0.9063 |
0.5263 |
0.6842 |
0.6053 |
0.8611 |
0.6714 |
0.7169 |
DMIS-KU-4 |
0.9583 |
0.9677 |
0.9412 |
0.9545 |
0.5263 |
0.6842 |
0.5965 |
0.8819 |
0.6714 |
0.7224 |
BioASQ Baseline ZS |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
BioASQ Baseline FS |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
BioASQ_Baseline |
0.3750 |
0.2857 |
0.4444 |
0.3651 |
0.0526 |
0.3684 |
0.1465 |
0.2855 |
0.4948 |
0.3201 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
OWLMan-phaseB-TaskV1 |
- |
- |
- |
- |
- |
- |
- |
- |
IISR-1 |
0.4042 |
0.3691 |
0.3837 |
0.3439 |
4.72 |
4.73 |
4.55 |
4.79 |
IISR-3 |
0.4249 |
0.4037 |
0.4138 |
0.3930 |
4.57 |
4.33 |
4.32 |
4.85 |
Fleming-4 |
- |
- |
- |
- |
1.21 |
0.92 |
1.05 |
1.28 |
AsqAway_1 |
- |
- |
- |
- |
- |
- |
- |
- |
AsqAway_2 |
- |
- |
- |
- |
- |
- |
- |
- |
AsqAway_3 |
- |
- |
- |
- |
- |
- |
- |
- |
AsqAway_4 |
- |
- |
- |
- |
- |
- |
- |
- |
MQ-1 |
0.2592 |
0.1951 |
0.2680 |
0.1896 |
4.51 |
3.49 |
3.39 |
4.68 |
MQ-2 |
0.2724 |
0.2470 |
0.2747 |
0.2398 |
4.63 |
3.61 |
3.55 |
4.83 |
MQ-3 |
0.5345 |
0.3980 |
0.5182 |
0.3778 |
4.69 |
4.81 |
4.31 |
4.73 |
Deep ML methods for |
0.0930 |
0.1385 |
0.0686 |
0.1036 |
3.41 |
3.01 |
3.55 |
4.55 |
MQ-4 |
0.5451 |
0.3310 |
0.5316 |
0.3151 |
4.33 |
4.52 |
3.77 |
4.20 |
MQ-5 |
0.5537 |
0.3350 |
0.5387 |
0.3180 |
4.33 |
4.52 |
3.76 |
4.13 |
simple truncation |
0.1504 |
0.0670 |
0.1462 |
0.0631 |
1.07 |
1.24 |
0.88 |
1.07 |
dmiip1 |
0.5168 |
0.3032 |
0.5115 |
0.2917 |
4.37 |
4.56 |
3.85 |
4.25 |
dmiip2 |
0.5228 |
0.3137 |
0.5133 |
0.3009 |
4.35 |
4.52 |
3.76 |
4.27 |
dmiip4 |
0.5359 |
0.3193 |
0.5263 |
0.3060 |
4.33 |
4.52 |
3.79 |
4.25 |
dmiip5 |
0.5395 |
0.3309 |
0.5275 |
0.3162 |
4.36 |
4.49 |
3.84 |
4.28 |
dmiip3 |
0.5228 |
0.3137 |
0.5133 |
0.3009 |
4.35 |
4.52 |
3.76 |
4.27 |
IISR-2 |
0.4148 |
0.3653 |
0.3995 |
0.3450 |
4.76 |
4.80 |
4.65 |
4.80 |
UR-IW-2 |
0.5630 |
0.2136 |
0.5521 |
0.1990 |
4.51 |
4.77 |
3.37 |
4.29 |
UR-IW-3 |
0.5245 |
0.1762 |
0.5209 |
0.1663 |
4.51 |
4.79 |
3.28 |
4.25 |
UR-IW-4 |
0.3531 |
0.0999 |
0.3796 |
0.1015 |
4.35 |
4.35 |
2.97 |
4.20 |
extractive |
0.1606 |
0.0700 |
0.1570 |
0.0667 |
1.09 |
1.24 |
0.87 |
1.16 |
abstractive |
0.1592 |
0.0702 |
0.1544 |
0.0665 |
1.12 |
1.20 |
0.87 |
1.16 |
bioinfo-0 |
0.3147 |
0.2979 |
0.3036 |
0.2788 |
4.69 |
4.53 |
4.39 |
4.71 |
BART |
0.1671 |
0.2001 |
0.1553 |
0.1881 |
4.32 |
2.52 |
2.93 |
4.76 |
DMIS-KU-2 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-3 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-5 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-1 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-4 |
- |
- |
- |
- |
- |
- |
- |
- |
BioASQ Baseline ZS |
0.1727 |
0.0977 |
0.1936 |
0.1004 |
3.19 |
2.64 |
2.41 |
3.88 |
BioASQ Baseline FS |
0.3048 |
0.2493 |
0.3026 |
0.2443 |
3.85 |
3.99 |
3.75 |
4.55 |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
IISR-2 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.6364 |
0.7273 |
0.6818 |
0.5308 |
0.3545 |
0.4022 |
IISR-3 |
0.5833 |
0.7368 |
- |
0.3684 |
0.6364 |
0.7273 |
0.6818 |
0.5308 |
0.3545 |
0.4022 |
IISR-1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.5455 |
0.6818 |
0.6136 |
0.5139 |
0.3319 |
0.3757 |
OWLMan-phaseB-TaskV1 |
0.4583 |
0.5806 |
0.2353 |
0.4080 |
0.4545 |
0.5455 |
0.4788 |
0.1833 |
0.1520 |
0.1522 |
UR-IW-3 |
0.9167 |
0.9333 |
0.8889 |
0.9111 |
0.5909 |
0.6818 |
0.6364 |
0.4973 |
0.4865 |
0.4611 |
UR-IW-2 |
0.9583 |
0.9655 |
0.9474 |
0.9564 |
0.6364 |
0.7273 |
0.6818 |
0.3973 |
0.4388 |
0.3967 |
Fleming-4 |
0.7083 |
0.8000 |
0.4615 |
0.6308 |
- | - | - |
- | - | - |
capstone-1 |
0.9167 |
0.9231 |
0.9091 |
0.9161 |
0.4091 |
0.6364 |
0.4939 |
0.2293 |
0.3680 |
0.2725 |
capstone-3 |
0.8333 |
0.8571 |
0.8000 |
0.8286 |
0.2727 |
0.3636 |
0.2955 |
0.2728 |
0.3680 |
0.2999 |
capstone-2 |
0.8333 |
0.8571 |
0.8000 |
0.8286 |
0.2727 |
0.3636 |
0.2955 |
0.3458 |
0.3528 |
0.3107 |
bioinfo-0 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
bioinfo-1 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
bioinfo-2 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
bioinfo-3 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
bioinfo-4 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
AsqAway_1 |
0.8750 |
0.8966 |
0.8421 |
0.8693 |
0.4545 |
0.5000 |
0.4659 |
0.2018 |
0.2228 |
0.1999 |
AsqAway_2 |
0.8750 |
0.8966 |
0.8421 |
0.8693 |
0.5000 |
0.5000 |
0.5000 |
0.2211 |
0.3171 |
0.2444 |
AsqAway_3 |
0.8750 |
0.8966 |
0.8421 |
0.8693 |
0.5000 |
0.5455 |
0.5227 |
0.2096 |
0.3448 |
0.2444 |
AsqAway_4 |
0.8750 |
0.8966 |
0.8421 |
0.8693 |
0.5000 |
0.5455 |
0.5227 |
0.2096 |
0.3448 |
0.2444 |
Deep ML methods for |
0.8333 |
0.8667 |
0.7778 |
0.8222 |
0.3182 |
0.3182 |
0.3182 |
0.3472 |
0.1860 |
0.1986 |
MindLab QA Reloaded |
0.7917 |
0.8276 |
0.7368 |
0.7822 |
0.3182 |
0.3182 |
0.3182 |
0.3264 |
0.1193 |
0.1616 |
MindLab QA System |
0.8333 |
0.8667 |
0.7778 |
0.8222 |
0.3182 |
0.4545 |
0.3455 |
0.1500 |
0.2008 |
0.1568 |
MindLab QA System ++ |
0.5417 |
0.6857 |
0.1538 |
0.4198 |
0.1818 |
0.2273 |
0.2045 |
0.4306 |
0.1332 |
0.1875 |
MindLab Red Lions++ |
0.7917 |
0.8276 |
0.7368 |
0.7822 |
0.3182 |
0.3182 |
0.3182 |
0.3264 |
0.1193 |
0.1616 |
MQ-1 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
MQ-2 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
MQ-3 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
MQ-4 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
MQ-5 |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
DMIS-KU-1 |
0.9583 |
0.9630 |
0.9524 |
0.9577 |
0.4091 |
0.7273 |
0.5530 |
0.3468 |
0.3632 |
0.3136 |
DMIS-KU-2 |
0.9583 |
0.9630 |
0.9524 |
0.9577 |
0.4091 |
0.7727 |
0.5523 |
0.3625 |
0.3466 |
0.3139 |
DMIS-KU-3 |
0.9583 |
0.9630 |
0.9524 |
0.9577 |
0.4545 |
0.6818 |
0.5568 |
0.3258 |
0.4210 |
0.3485 |
DMIS-KU-4 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.4545 |
0.6818 |
0.5455 |
0.3088 |
0.3692 |
0.2921 |
DMIS-KU-5 |
0.9167 |
0.9286 |
0.9000 |
0.9143 |
0.4545 |
0.6818 |
0.5492 |
0.2707 |
0.4496 |
0.3121 |
dmiip2 |
0.5833 |
0.7368 |
- |
0.3684 |
0.4091 |
0.6818 |
0.5189 |
0.3774 |
0.3316 |
0.3188 |
dmiip3 |
0.8750 |
0.8889 |
0.8571 |
0.8730 |
0.3636 |
0.5909 |
0.4447 |
0.3074 |
0.2566 |
0.2408 |
dmiip5 |
0.9583 |
0.9655 |
0.9474 |
0.9564 |
0.4091 |
0.4545 |
0.4242 |
0.1579 |
0.2187 |
0.1783 |
dmiip4 |
0.7917 |
0.8276 |
0.7368 |
0.7822 |
0.4545 |
0.6818 |
0.5568 |
0.2816 |
0.3335 |
0.2740 |
dmiip1 |
0.9167 |
0.9286 |
0.9000 |
0.9143 |
0.3182 |
0.6818 |
0.4659 |
0.2439 |
0.3820 |
0.2643 |
simple truncation |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
extractive |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
abstractive |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
BioASQ_Baseline |
0.5000 |
0.3333 |
0.6000 |
0.4667 |
0.0909 |
0.1364 |
0.1136 |
0.1237 |
0.2909 |
0.1674 |
BioASQ Baseline ZS |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
BioASQ Baseline FS |
0.5833 |
0.7368 |
- |
0.3684 |
- | - | - |
- | - | - |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
IISR-2 |
0.3722 |
0.3500 |
0.3599 |
0.3335 |
4.87 |
4.63 |
4.80 |
4.95 |
IISR-3 |
0.3176 |
0.3098 |
0.3192 |
0.3048 |
4.68 |
4.31 |
4.55 |
4.96 |
IISR-1 |
0.3662 |
0.3421 |
0.3546 |
0.3269 |
4.81 |
4.59 |
4.69 |
4.91 |
OWLMan-phaseB-TaskV1 |
- |
- |
- |
- |
- |
- |
- |
- |
UR-IW-3 |
0.5181 |
0.2117 |
0.5251 |
0.2024 |
4.49 |
4.81 |
3.31 |
4.21 |
UR-IW-2 |
0.5254 |
0.2316 |
0.5216 |
0.2180 |
4.51 |
4.89 |
3.47 |
4.23 |
Fleming-4 |
- |
- |
- |
- |
1.08 |
1.03 |
1.33 |
1.55 |
capstone-1 |
0.4674 |
0.3288 |
0.4680 |
0.3227 |
4.37 |
4.52 |
3.91 |
4.51 |
capstone-3 |
0.4674 |
0.3288 |
0.4680 |
0.3227 |
4.37 |
4.52 |
3.91 |
4.51 |
capstone-2 |
0.4674 |
0.3288 |
0.4680 |
0.3227 |
4.37 |
4.52 |
3.91 |
4.51 |
bioinfo-0 |
0.2128 |
0.2034 |
0.2107 |
0.1959 |
4.23 |
4.01 |
3.97 |
4.65 |
bioinfo-1 |
0.2648 |
0.2526 |
0.2578 |
0.2419 |
4.43 |
4.16 |
4.19 |
4.64 |
bioinfo-2 |
0.2805 |
0.2942 |
0.2761 |
0.2875 |
4.59 |
4.24 |
4.61 |
4.91 |
bioinfo-3 |
0.3267 |
0.3376 |
0.3138 |
0.3230 |
4.61 |
4.40 |
4.79 |
4.93 |
bioinfo-4 |
0.3186 |
0.3268 |
0.3074 |
0.3137 |
4.57 |
4.36 |
4.75 |
4.92 |
AsqAway_1 |
- |
- |
- |
- |
- |
- |
- |
- |
AsqAway_2 |
- |
- |
- |
- |
- |
- |
- |
- |
AsqAway_3 |
- |
- |
- |
- |
- |
- |
- |
- |
AsqAway_4 |
- |
- |
- |
- |
- |
- |
- |
- |
Deep ML methods for |
0.0491 |
0.0765 |
0.0419 |
0.0626 |
2.87 |
2.52 |
3.31 |
4.31 |
MindLab QA Reloaded |
0.0524 |
0.0798 |
0.0449 |
0.0659 |
3.00 |
2.67 |
3.45 |
4.36 |
MindLab QA System |
0.0229 |
0.0360 |
0.0278 |
0.0384 |
3.05 |
2.63 |
3.40 |
4.25 |
MindLab QA System ++ |
0.0659 |
0.1014 |
0.0498 |
0.0793 |
2.93 |
2.68 |
3.39 |
4.36 |
MindLab Red Lions++ |
0.0524 |
0.0798 |
0.0449 |
0.0659 |
3.00 |
2.67 |
3.45 |
4.36 |
MQ-1 |
0.2268 |
0.1891 |
0.2386 |
0.1912 |
4.75 |
3.77 |
3.99 |
4.84 |
MQ-2 |
0.2710 |
0.2430 |
0.2620 |
0.2328 |
4.93 |
4.05 |
4.29 |
4.92 |
MQ-3 |
0.4956 |
0.3714 |
0.4832 |
0.3556 |
4.51 |
4.72 |
4.29 |
4.64 |
MQ-4 |
0.4674 |
0.3288 |
0.4680 |
0.3227 |
4.37 |
4.52 |
3.89 |
4.52 |
MQ-5 |
0.4674 |
0.3285 |
0.4684 |
0.3228 |
4.37 |
4.52 |
3.89 |
4.52 |
DMIS-KU-1 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-2 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-3 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-4 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-5 |
- |
- |
- |
- |
- |
- |
- |
- |
dmiip2 |
0.4642 |
0.3294 |
0.4675 |
0.3238 |
4.35 |
4.59 |
3.88 |
4.47 |
dmiip3 |
0.4642 |
0.3294 |
0.4675 |
0.3238 |
4.35 |
4.59 |
3.88 |
4.47 |
dmiip5 |
0.4155 |
0.3719 |
0.3966 |
0.3487 |
4.77 |
4.73 |
4.56 |
4.84 |
dmiip4 |
0.4700 |
0.3271 |
0.4710 |
0.3207 |
4.35 |
4.51 |
3.91 |
4.49 |
dmiip1 |
0.4098 |
0.2959 |
0.4219 |
0.2963 |
4.32 |
4.33 |
3.81 |
4.44 |
simple truncation |
0.1358 |
0.0850 |
0.1342 |
0.0828 |
0.97 |
1.05 |
0.88 |
0.99 |
extractive |
0.1277 |
0.0813 |
0.1265 |
0.0792 |
1.03 |
1.05 |
0.85 |
1.04 |
abstractive |
0.1306 |
0.0810 |
0.1311 |
0.0790 |
0.97 |
1.00 |
0.79 |
0.99 |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |
BioASQ Baseline ZS |
0.1728 |
0.1106 |
0.1979 |
0.1193 |
2.87 |
2.44 |
2.13 |
3.36 |
BioASQ Baseline FS |
0.2645 |
0.2425 |
0.2664 |
0.2395 |
3.33 |
3.53 |
3.56 |
4.25 |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
UR-IW-5 |
0.7917 |
0.8485 |
0.6667 |
0.7576 |
0.4231 |
0.4231 |
0.4231 |
0.3421 |
0.2374 |
0.2565 |
UR-IW-4 |
0.8750 |
0.9032 |
0.8235 |
0.8634 |
0.3077 |
0.5000 |
0.4038 |
0.3546 |
0.2800 |
0.2939 |
UR-IW-3 |
0.8750 |
0.9091 |
0.8000 |
0.8545 |
0.5385 |
0.6154 |
0.5705 |
0.5600 |
0.4376 |
0.4693 |
UR-IW-2 |
0.9167 |
0.9375 |
0.8750 |
0.9063 |
0.5000 |
0.5385 |
0.5192 |
0.5518 |
0.6010 |
0.5441 |
bioinfo-0 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
bioinfo-2 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
bioinfo-3 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
MQ-1 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
MQ-2 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
MQ-3 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
MQ-4 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
MQ-5 |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
Deep ML methods for |
0.7917 |
0.8485 |
0.6667 |
0.7576 |
0.3077 |
0.3077 |
0.3077 |
0.2778 |
0.0922 |
0.1274 |
capstone-1 |
0.7083 |
0.7742 |
0.5882 |
0.6812 |
0.4231 |
0.6923 |
0.5417 |
0.5243 |
0.3608 |
0.3783 |
capstone-2 |
0.7500 |
0.8000 |
0.6667 |
0.7333 |
0.4615 |
0.6923 |
0.5590 |
0.4606 |
0.4225 |
0.3984 |
capstone-3 |
0.9167 |
0.9333 |
0.8889 |
0.9111 |
0.4231 |
0.4615 |
0.4423 |
0.2444 |
0.1848 |
0.1985 |
capstone-4 |
0.9167 |
0.9333 |
0.8889 |
0.9111 |
0.3077 |
0.4615 |
0.3622 |
0.2222 |
0.2033 |
0.1943 |
capstone-5 |
0.7083 |
0.7742 |
0.5882 |
0.6812 |
- | - | - |
0.5000 |
0.1127 |
0.1724 |
MindLab QA Reloaded |
0.5417 |
0.5600 |
0.5217 |
0.5409 |
0.1923 |
0.1923 |
0.1923 |
0.0833 |
0.0278 |
0.0417 |
IISR-1 |
0.9167 |
0.9375 |
0.8750 |
0.9063 |
0.4231 |
0.4231 |
0.4231 |
0.5292 |
0.4141 |
0.4420 |
IISR-2 |
0.9167 |
0.9333 |
0.8889 |
0.9111 |
0.4231 |
0.4615 |
0.4423 |
0.6796 |
0.5395 |
0.5608 |
IISR-3 |
0.6250 |
0.7692 |
- |
0.3846 |
0.4231 |
0.4615 |
0.4423 |
0.6796 |
0.5395 |
0.5608 |
AsqAway_1 |
0.8750 |
0.9091 |
0.8000 |
0.8545 |
0.3077 |
0.4231 |
0.3474 |
0.4101 |
0.4395 |
0.4071 |
MindLab QA System |
0.5833 |
0.5833 |
0.5833 |
0.5833 |
0.0769 |
0.1538 |
0.1090 |
0.3361 |
0.1833 |
0.2242 |
AsqAway_2 |
0.8750 |
0.9091 |
0.8000 |
0.8545 |
0.4615 |
0.5000 |
0.4808 |
0.4565 |
0.5132 |
0.4560 |
MindLab Red Lions++ |
0.5417 |
0.5600 |
0.5217 |
0.5409 |
0.1923 |
0.1923 |
0.1923 |
0.0833 |
0.0278 |
0.0417 |
AsqAway_3 |
0.8750 |
0.9091 |
0.8000 |
0.8545 |
0.4615 |
0.5385 |
0.5000 |
0.3764 |
0.5595 |
0.4226 |
AsqAway_4 |
0.8750 |
0.9091 |
0.8000 |
0.8545 |
0.4615 |
0.5385 |
0.5000 |
0.3764 |
0.5595 |
0.4226 |
kmeans |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
extractive |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
similarity measures |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
abstractive |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
simple truncation |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
dmiip1 |
0.8333 |
0.8824 |
0.7143 |
0.7983 |
0.5000 |
0.7308 |
0.5667 |
0.2932 |
0.4897 |
0.3278 |
dmiip2 |
0.8333 |
0.8667 |
0.7778 |
0.8222 |
0.4231 |
0.5385 |
0.4647 |
0.3363 |
0.4453 |
0.3396 |
dmiip3 |
0.7083 |
0.8000 |
0.4615 |
0.6308 |
0.4231 |
0.6923 |
0.5397 |
0.3951 |
0.5472 |
0.4126 |
dmiip4 |
0.7500 |
0.8235 |
0.5714 |
0.6975 |
0.4615 |
0.7692 |
0.5814 |
0.3840 |
0.4441 |
0.3620 |
dmiip5 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.4615 |
0.6923 |
0.5481 |
0.3167 |
0.4862 |
0.3307 |
Fleming-4 |
0.7500 |
0.8333 |
0.5000 |
0.6667 |
- | - | - |
- | - | - |
ELErank |
0.9583 |
0.9677 |
0.9412 |
0.9545 |
0.5000 |
0.5769 |
0.5288 |
- | - | - |
DMIS-KU-1 |
0.9583 |
0.9677 |
0.9412 |
0.9545 |
0.5000 |
0.6538 |
0.5538 |
0.5458 |
0.4689 |
0.4916 |
DMIS-KU-2 |
0.8750 |
0.8966 |
0.8421 |
0.8693 |
0.4615 |
0.6538 |
0.5462 |
0.5258 |
0.4991 |
0.5025 |
DMIS-KU-3 |
0.8750 |
0.9032 |
0.8235 |
0.8634 |
0.4615 |
0.6538 |
0.5365 |
0.5529 |
0.4760 |
0.5012 |
DMIS-KU-4 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.4231 |
0.5385 |
0.4596 |
0.5680 |
0.4689 |
0.4975 |
DMIS-KU-5 |
0.9167 |
0.9333 |
0.8889 |
0.9111 |
0.3846 |
0.6538 |
0.5032 |
0.5322 |
0.5257 |
0.5156 |
DICE1 |
0.6667 |
0.6923 |
0.6364 |
0.6643 |
0.3846 |
0.3846 |
0.3846 |
0.4148 |
0.3221 |
0.3307 |
DICE2 |
0.6250 |
0.6897 |
0.5263 |
0.6080 |
0.3077 |
0.3077 |
0.3077 |
0.3339 |
0.3072 |
0.2955 |
BioASQ_Baseline |
0.4167 |
0.3000 |
0.5000 |
0.4000 |
0.0385 |
0.3077 |
0.1487 |
0.1726 |
0.2450 |
0.1599 |
BioASQ Baseline ZS |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
BioASQ Baseline FS |
0.6250 |
0.7692 |
- |
0.3846 |
- | - | - |
- | - | - |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
UR-IW-5 |
0.3083 |
0.1074 |
0.3363 |
0.1116 |
4.64 |
4.22 |
2.88 |
4.47 |
UR-IW-4 |
0.3209 |
0.1123 |
0.3541 |
0.1176 |
4.61 |
4.53 |
2.97 |
4.41 |
UR-IW-3 |
0.4835 |
0.1869 |
0.4842 |
0.1781 |
4.62 |
4.97 |
3.22 |
4.36 |
UR-IW-2 |
0.5068 |
0.2151 |
0.4921 |
0.2004 |
4.72 |
4.97 |
3.30 |
4.50 |
bioinfo-0 |
0.1803 |
0.1401 |
0.1911 |
0.1445 |
4.59 |
4.03 |
3.86 |
4.72 |
bioinfo-2 |
0.1755 |
0.1328 |
0.1880 |
0.1346 |
4.53 |
4.08 |
4.02 |
4.68 |
bioinfo-3 |
0.1517 |
0.1058 |
0.1684 |
0.1121 |
4.42 |
3.78 |
3.52 |
4.47 |
MQ-1 |
0.2286 |
0.2088 |
0.2284 |
0.2056 |
4.73 |
3.67 |
3.86 |
4.88 |
MQ-2 |
0.2444 |
0.2321 |
0.2430 |
0.2264 |
4.84 |
3.82 |
4.02 |
4.92 |
MQ-3 |
0.4362 |
0.3636 |
0.4327 |
0.3507 |
4.89 |
4.86 |
4.53 |
4.88 |
MQ-4 |
0.4386 |
0.2807 |
0.4382 |
0.2703 |
4.32 |
4.58 |
3.97 |
4.36 |
MQ-5 |
0.4336 |
0.2778 |
0.4333 |
0.2675 |
4.34 |
4.57 |
3.98 |
4.31 |
Deep ML methods for |
0.0601 |
0.0957 |
0.0423 |
0.0693 |
2.86 |
2.82 |
3.12 |
4.07 |
capstone-1 |
0.4386 |
0.2807 |
0.4382 |
0.2703 |
4.31 |
4.58 |
3.98 |
4.33 |
capstone-2 |
0.4386 |
0.2807 |
0.4382 |
0.2703 |
4.31 |
4.58 |
3.98 |
4.33 |
capstone-3 |
0.4386 |
0.2807 |
0.4382 |
0.2703 |
4.31 |
4.58 |
3.98 |
4.33 |
capstone-4 |
0.4386 |
0.2807 |
0.4382 |
0.2703 |
4.31 |
4.58 |
3.98 |
4.33 |
capstone-5 |
0.4386 |
0.2807 |
0.4382 |
0.2703 |
4.31 |
4.58 |
3.98 |
4.33 |
MindLab QA Reloaded |
0.0495 |
0.0767 |
0.0371 |
0.0598 |
2.90 |
2.58 |
3.07 |
4.16 |
IISR-1 |
0.3391 |
0.3417 |
0.3302 |
0.3281 |
4.83 |
4.59 |
4.79 |
4.97 |
IISR-2 |
0.3444 |
0.3313 |
0.3328 |
0.3175 |
4.91 |
4.69 |
4.77 |
4.98 |
IISR-3 |
0.3168 |
0.2922 |
0.3102 |
0.2796 |
4.57 |
4.28 |
4.46 |
4.91 |
AsqAway_1 |
- |
- |
- |
- |
- |
- |
- |
- |
MindLab QA System |
0.0378 |
0.0571 |
0.0303 |
0.0480 |
3.00 |
2.51 |
3.00 |
4.16 |
AsqAway_2 |
- |
- |
- |
- |
- |
- |
- |
- |
MindLab Red Lions++ |
0.0495 |
0.0767 |
0.0371 |
0.0598 |
2.90 |
2.58 |
3.07 |
4.16 |
AsqAway_3 |
- |
- |
- |
- |
- |
- |
- |
- |
AsqAway_4 |
- |
- |
- |
- |
- |
- |
- |
- |
kmeans |
0.1113 |
0.1034 |
0.1096 |
0.1002 |
1.16 |
1.07 |
1.07 |
1.21 |
extractive |
0.1220 |
0.1156 |
0.1188 |
0.1121 |
1.13 |
1.17 |
1.16 |
1.21 |
similarity measures |
0.1072 |
0.1024 |
0.1048 |
0.0988 |
1.16 |
1.08 |
1.09 |
1.21 |
abstractive |
0.1179 |
0.1133 |
0.1170 |
0.1106 |
1.16 |
1.19 |
1.16 |
1.20 |
simple truncation |
0.1235 |
0.1226 |
0.1193 |
0.1175 |
1.16 |
1.18 |
1.17 |
1.21 |
dmiip1 |
0.3260 |
0.1364 |
0.3423 |
0.1389 |
4.21 |
4.06 |
3.14 |
3.83 |
dmiip2 |
0.4281 |
0.2145 |
0.4313 |
0.2080 |
4.41 |
4.54 |
3.73 |
4.02 |
dmiip3 |
0.4364 |
0.2171 |
0.4392 |
0.2103 |
4.41 |
4.56 |
3.71 |
4.01 |
dmiip4 |
0.4286 |
0.2137 |
0.4307 |
0.2049 |
4.41 |
4.40 |
3.66 |
3.86 |
dmiip5 |
0.3900 |
0.3525 |
0.3886 |
0.3469 |
4.82 |
4.66 |
4.67 |
4.86 |
Fleming-4 |
- |
- |
- |
- |
0.86 |
0.78 |
0.93 |
1.07 |
ELErank |
0.2361 |
0.2491 |
0.2301 |
0.2370 |
4.62 |
3.81 |
4.22 |
4.77 |
DMIS-KU-1 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-2 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-3 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-4 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-5 |
- |
- |
- |
- |
- |
- |
- |
- |
DICE1 |
- |
- |
- |
- |
- |
- |
- |
- |
DICE2 |
- |
- |
- |
- |
- |
- |
- |
- |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |
BioASQ Baseline ZS |
0.1669 |
0.1206 |
0.1952 |
0.1362 |
3.03 |
2.48 |
2.17 |
3.44 |
BioASQ Baseline FS |
0.2192 |
0.1923 |
0.2241 |
0.1912 |
3.49 |
3.82 |
3.76 |
4.46 |
Test batch 4
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
UR-IW-5 |
0.4286 |
0.5000 |
0.3333 |
0.4167 |
0.3226 |
0.3871 |
0.3495 |
0.4421 |
0.3495 |
0.3734 |
UR-IW-4 |
0.7857 |
0.7273 |
0.8235 |
0.7754 |
0.2258 |
0.2903 |
0.2473 |
0.4881 |
0.4489 |
0.4492 |
UR-IW-3 |
0.9286 |
0.8571 |
0.9524 |
0.9048 |
0.6452 |
0.6774 |
0.6613 |
0.6167 |
0.6619 |
0.6211 |
UR-IW-2 |
0.9286 |
0.8889 |
0.9474 |
0.9181 |
0.5484 |
0.6452 |
0.5968 |
0.6939 |
0.7516 |
0.7069 |
IISR-2 |
0.9286 |
0.8889 |
0.9474 |
0.9181 |
0.4516 |
0.4839 |
0.4677 |
0.6456 |
0.6517 |
0.6316 |
IISR-1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.3871 |
0.4194 |
0.4032 |
0.7213 |
0.6305 |
0.6618 |
bioinfo-0 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
bioinfo-1 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
bioinfo-2 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
bioinfo-3 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
bioinfo-4 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
dmiip1 |
0.2857 |
0.4444 |
- |
0.2222 |
0.4839 |
0.8065 |
0.6022 |
0.4871 |
0.4253 |
0.4368 |
dmiip2 |
0.7857 |
0.6667 |
0.8421 |
0.7544 |
0.4839 |
0.7742 |
0.6032 |
0.5610 |
0.4919 |
0.4927 |
dmiip3 |
0.7857 |
0.6667 |
0.8421 |
0.7544 |
0.5806 |
0.8387 |
0.6855 |
0.5597 |
0.4490 |
0.4717 |
dmiip4 |
0.7143 |
0.6667 |
0.7500 |
0.7083 |
0.4194 |
0.6774 |
0.5226 |
0.5027 |
0.4384 |
0.4336 |
dmiip5 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.5161 |
0.7419 |
0.6129 |
0.4753 |
0.4281 |
0.4293 |
MQ-5 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
MQ-3 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
MQ-1 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
ELErank |
0.8571 |
0.7500 |
0.9000 |
0.8250 |
0.5161 |
0.7419 |
0.6086 |
- | - | - |
ELErank+ |
0.8571 |
0.7500 |
0.9000 |
0.8250 |
0.5161 |
0.7419 |
0.6086 |
- | - | - |
capstone-1 |
0.7857 |
0.6667 |
0.8421 |
0.7544 |
0.4839 |
0.6774 |
0.5522 |
0.4840 |
0.2924 |
0.3396 |
MQ-2 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
MQ-4 |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
capstone-2 |
0.7857 |
0.6667 |
0.8421 |
0.7544 |
0.4839 |
0.6774 |
0.5522 |
0.4954 |
0.4265 |
0.4130 |
capstone-3 |
0.7857 |
0.6667 |
0.8421 |
0.7544 |
0.4839 |
0.6774 |
0.5522 |
0.4840 |
0.2924 |
0.3396 |
capstone-4 |
0.7857 |
0.6667 |
0.8421 |
0.7544 |
0.2581 |
0.5161 |
0.3613 |
0.4954 |
0.4265 |
0.4130 |
capstone-5 |
0.7857 |
0.6667 |
0.8421 |
0.7544 |
0.4839 |
0.6774 |
0.5522 |
0.4415 |
0.4335 |
0.3989 |
AUEB-System1 |
0.2857 |
0.4444 |
- |
0.2222 |
0.3226 |
0.4516 |
0.3763 |
0.0257 |
0.0212 |
0.0206 |
Deep ML methods for |
0.3571 |
0.4000 |
0.3077 |
0.3538 |
0.2581 |
0.2581 |
0.2581 |
0.2847 |
0.1129 |
0.1442 |
MindLab QA System |
0.3571 |
0.4000 |
0.3077 |
0.3538 |
0.2581 |
0.2581 |
0.2581 |
0.2847 |
0.1129 |
0.1442 |
DICE1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.5484 |
0.5484 |
0.5484 |
0.5697 |
0.4855 |
0.5068 |
MindLab QA System ++ |
0.3571 |
0.4000 |
0.3077 |
0.3538 |
0.2581 |
0.2581 |
0.2581 |
0.2847 |
0.1129 |
0.1442 |
Fleming-4 |
0.5714 |
0.4000 |
0.6667 |
0.5333 |
- | - | - |
- | - | - |
OWLMan-phaseB-TaskV1 |
0.4286 |
0.5000 |
0.3333 |
0.4167 |
0.4194 |
0.4839 |
0.4435 |
0.2500 |
0.3044 |
0.2533 |
DMIS-KU-1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.6452 |
0.8710 |
0.7323 |
0.7221 |
0.7874 |
0.7399 |
DMIS-KU-2 |
0.9286 |
0.8889 |
0.9474 |
0.9181 |
0.6452 |
0.8387 |
0.7108 |
0.7396 |
0.6395 |
0.6693 |
DMIS-KU-3 |
0.9286 |
0.8571 |
0.9524 |
0.9048 |
0.5806 |
0.8387 |
0.6882 |
0.7460 |
0.6256 |
0.6639 |
DMIS-KU-4 |
0.8571 |
0.7500 |
0.9000 |
0.8250 |
0.5484 |
0.8387 |
0.6570 |
0.7780 |
0.6044 |
0.6578 |
DMIS-KU-5 |
0.9286 |
0.8889 |
0.9474 |
0.9181 |
0.5484 |
0.8065 |
0.6473 |
0.7294 |
0.6395 |
0.6607 |
IISR-3 |
0.2857 |
0.4444 |
- |
0.2222 |
0.4194 |
0.4194 |
0.4194 |
0.2500 |
0.1644 |
0.1827 |
AUEB-System2 |
0.2857 |
0.4444 |
- |
0.2222 |
0.2903 |
0.4194 |
0.3387 |
0.0396 |
0.0424 |
0.0395 |
IISR-4 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.4516 |
0.4516 |
0.4516 |
0.6944 |
0.5635 |
0.5895 |
DICE2 |
0.9286 |
0.8889 |
0.9474 |
0.9181 |
0.2581 |
0.2581 |
0.2581 |
- | - | - |
abstractive |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
DICE_Lab |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.4839 |
0.4839 |
0.4839 |
0.5163 |
0.4870 |
0.4839 |
extractive |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
kmeans |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
similarity measures |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
simple truncation |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
DICE_Lab2 |
0.9286 |
0.8889 |
0.9474 |
0.9181 |
0.5484 |
0.5484 |
0.5484 |
0.3917 |
0.3613 |
0.3648 |
IISR-5 |
0.9286 |
0.8889 |
0.9474 |
0.9181 |
0.4839 |
0.5161 |
0.4946 |
0.7412 |
0.6764 |
0.6821 |
BioASQ_Baseline |
0.6429 |
- |
0.7826 |
0.3913 |
0.1290 |
0.2581 |
0.1720 |
0.1970 |
0.3275 |
0.2256 |
BioASQ Baseline ZS |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
BioASQ Baseline FS |
0.2857 |
0.4444 |
- |
0.2222 |
- | - | - |
- | - | - |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
UR-IW-5 |
0.3240 |
0.1442 |
0.3458 |
0.1519 |
4.71 |
4.38 |
2.96 |
4.46 |
UR-IW-4 |
0.3440 |
0.1506 |
0.3700 |
0.1571 |
4.68 |
4.54 |
3.08 |
4.41 |
UR-IW-3 |
0.4811 |
0.2349 |
0.4812 |
0.2266 |
4.66 |
4.91 |
3.22 |
4.37 |
UR-IW-2 |
0.5006 |
0.2658 |
0.4866 |
0.2548 |
4.72 |
4.91 |
3.36 |
4.53 |
IISR-2 |
0.2840 |
0.2882 |
0.2792 |
0.2774 |
4.91 |
4.56 |
4.90 |
4.99 |
IISR-1 |
0.2929 |
0.3096 |
0.2873 |
0.2993 |
4.91 |
4.52 |
4.90 |
4.99 |
bioinfo-0 |
0.1760 |
0.1667 |
0.1828 |
0.1637 |
4.67 |
3.93 |
4.37 |
4.91 |
bioinfo-1 |
0.2310 |
0.2247 |
0.2244 |
0.2138 |
4.73 |
4.31 |
4.73 |
4.98 |
bioinfo-2 |
0.1149 |
0.1060 |
0.1267 |
0.1088 |
3.88 |
3.58 |
3.59 |
4.39 |
bioinfo-3 |
0.1864 |
0.1654 |
0.1901 |
0.1628 |
4.83 |
4.02 |
4.30 |
4.89 |
bioinfo-4 |
0.1057 |
0.0963 |
0.1177 |
0.1009 |
4.02 |
3.30 |
3.31 |
4.38 |
dmiip1 |
0.4294 |
0.3325 |
0.4223 |
0.3207 |
4.30 |
4.42 |
4.13 |
4.30 |
dmiip2 |
0.4294 |
0.3325 |
0.4223 |
0.3207 |
4.30 |
4.42 |
4.13 |
4.30 |
dmiip3 |
0.4284 |
0.3310 |
0.4212 |
0.3192 |
4.30 |
4.43 |
4.13 |
4.30 |
dmiip4 |
0.4387 |
0.3299 |
0.4295 |
0.3171 |
4.36 |
4.46 |
4.14 |
4.22 |
dmiip5 |
0.2987 |
0.3075 |
0.2938 |
0.2971 |
4.29 |
4.36 |
4.79 |
4.88 |
MQ-5 |
0.4375 |
0.3331 |
0.4281 |
0.3207 |
4.36 |
4.47 |
4.12 |
4.26 |
MQ-3 |
0.4406 |
0.3714 |
0.4229 |
0.3523 |
4.76 |
4.72 |
4.58 |
4.80 |
MQ-1 |
0.3749 |
0.3670 |
0.3559 |
0.3441 |
4.84 |
4.50 |
4.77 |
4.92 |
ELErank |
0.2347 |
0.2557 |
0.2211 |
0.2400 |
4.64 |
3.53 |
4.20 |
4.63 |
ELErank+ |
0.2347 |
0.2557 |
0.2211 |
0.2400 |
4.64 |
3.53 |
4.20 |
4.63 |
capstone-1 |
0.4373 |
0.3339 |
0.4285 |
0.3216 |
4.33 |
4.48 |
4.12 |
4.27 |
MQ-2 |
0.3370 |
0.3463 |
0.3251 |
0.3292 |
4.50 |
4.16 |
4.48 |
4.69 |
MQ-4 |
0.4373 |
0.3339 |
0.4285 |
0.3216 |
4.33 |
4.48 |
4.12 |
4.27 |
capstone-2 |
0.4373 |
0.3339 |
0.4285 |
0.3216 |
4.33 |
4.48 |
4.12 |
4.27 |
capstone-3 |
0.5012 |
0.3275 |
0.5015 |
0.3170 |
3.86 |
4.46 |
3.86 |
3.53 |
capstone-4 |
0.5012 |
0.3275 |
0.5015 |
0.3170 |
3.86 |
4.46 |
3.86 |
3.53 |
capstone-5 |
0.4373 |
0.3339 |
0.4285 |
0.3216 |
4.33 |
4.48 |
4.12 |
4.27 |
AUEB-System1 |
- |
- |
- |
- |
- |
- |
- |
- |
Deep ML methods for |
0.0351 |
0.0546 |
0.0302 |
0.0495 |
2.63 |
2.51 |
3.10 |
4.01 |
MindLab QA System |
0.0582 |
0.0967 |
0.0401 |
0.0697 |
2.61 |
2.78 |
3.50 |
3.89 |
DICE1 |
- |
- |
- |
- |
- |
- |
- |
- |
MindLab QA System ++ |
0.0412 |
0.0662 |
0.0328 |
0.0549 |
2.63 |
2.50 |
3.16 |
4.00 |
Fleming-4 |
- |
- |
- |
- |
0.40 |
0.36 |
0.56 |
0.64 |
OWLMan-phaseB-TaskV1 |
- |
- |
- |
- |
1.00 |
1.00 |
1.00 |
1.00 |
DMIS-KU-1 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-2 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-3 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-4 |
- |
- |
- |
- |
- |
- |
- |
- |
DMIS-KU-5 |
- |
- |
- |
- |
- |
- |
- |
- |
IISR-3 |
0.2721 |
0.2961 |
0.2591 |
0.2815 |
4.62 |
4.01 |
4.51 |
4.91 |
AUEB-System2 |
- |
- |
- |
- |
- |
- |
- |
- |
IISR-4 |
0.2889 |
0.3031 |
0.2792 |
0.2893 |
4.89 |
4.47 |
4.90 |
4.98 |
DICE2 |
- |
- |
- |
- |
- |
- |
- |
- |
abstractive |
0.0940 |
0.1032 |
0.0920 |
0.1007 |
1.07 |
1.02 |
1.13 |
1.16 |
DICE_Lab |
- |
- |
- |
- |
- |
- |
- |
- |
extractive |
0.0946 |
0.1025 |
0.0922 |
0.0997 |
1.08 |
1.03 |
1.13 |
1.14 |
kmeans |
0.1088 |
0.1106 |
0.1066 |
0.1081 |
1.11 |
1.02 |
1.14 |
1.17 |
similarity measures |
0.0834 |
0.0944 |
0.0809 |
0.0916 |
1.12 |
1.00 |
1.13 |
1.17 |
simple truncation |
0.0952 |
0.1051 |
0.0935 |
0.1029 |
1.11 |
0.97 |
1.12 |
1.17 |
DICE_Lab2 |
- |
- |
- |
- |
- |
- |
- |
- |
IISR-5 |
0.2827 |
0.2917 |
0.2747 |
0.2785 |
4.91 |
4.57 |
4.89 |
4.99 |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |
BioASQ Baseline ZS |
0.1552 |
0.1045 |
0.1745 |
0.1104 |
2.44 |
2.09 |
1.93 |
3.19 |
BioASQ Baseline FS |
0.2277 |
0.2236 |
0.2226 |
0.2150 |
3.29 |
3.31 |
3.76 |
4.23 |
Test batch 5
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
mibi_rag_abstract |
0.8400 |
0.8667 |
0.8000 |
0.8333 |
0.0476 |
0.0476 |
0.0476 |
0.5048 |
0.3804 |
0.4147 |
mibi_rag_snippet |
0.9200 |
0.9333 |
0.9000 |
0.9167 |
- | - | - |
0.5286 |
0.4107 |
0.4441 |
UR-IW-1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2857 |
0.3810 |
0.3254 |
0.4840 |
0.4173 |
0.4266 |
UR-IW-4 |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.2381 |
0.2381 |
0.2381 |
0.4563 |
0.3478 |
0.3778 |
UR-IW-2 |
0.8800 |
0.8889 |
0.8696 |
0.8792 |
0.2381 |
0.2857 |
0.2619 |
0.5255 |
0.4510 |
0.4764 |
UR-IW-5 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2381 |
0.2857 |
0.2540 |
0.6054 |
0.5158 |
0.5404 |
UR-IW-3 |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.2381 |
0.2381 |
0.2381 |
0.6010 |
0.5158 |
0.5337 |
Gatech competition |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.2381 |
0.2381 |
0.2381 |
0.4939 |
0.3433 |
0.3587 |
Mistral-7B finetune |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.2381 |
0.2381 |
0.2381 |
0.5786 |
0.5006 |
0.5265 |
Synthia with first |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.2857 |
0.2857 |
0.2857 |
0.4627 |
0.4107 |
0.4020 |
LLM4SciLit |
0.4000 |
- |
0.5714 |
0.2857 |
- | - | - |
- | - | - |
RMC_append_snippets |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.1905 |
0.1905 |
0.1905 |
0.4770 |
0.4400 |
0.4365 |
bioinfo-0 |
0.6000 |
0.7500 |
- |
0.3750 |
- | - | - |
- | - | - |
bioinfo-1 |
0.6000 |
0.7500 |
- |
0.3750 |
- | - | - |
- | - | - |
bioinfo-2 |
0.6000 |
0.7500 |
- |
0.3750 |
- | - | - |
- | - | - |
bioinfo-3 |
0.0400 |
0.0769 |
- |
0.0385 |
0.1905 |
0.1905 |
0.1905 |
0.3214 |
0.2797 |
0.2950 |
bioinfo-4 |
- |
- |
- |
- |
0.2857 |
0.2857 |
0.2857 |
0.4169 |
0.3464 |
0.3698 |
Fleming-1 |
0.8400 |
0.8750 |
0.7778 |
0.8264 |
0.0476 |
0.0952 |
0.0714 |
0.5196 |
0.4190 |
0.4420 |
dmiip2024_2 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2381 |
0.2381 |
0.2381 |
0.3943 |
0.5380 |
0.4249 |
dmiip2024_3 |
0.9600 |
0.9677 |
0.9474 |
0.9576 |
0.3333 |
0.3810 |
0.3571 |
0.5942 |
0.4725 |
0.5068 |
dmiip2024_4 |
0.4000 |
- |
0.5714 |
0.2857 |
0.3333 |
0.5238 |
0.4206 |
0.4481 |
0.4682 |
0.4386 |
dmiip2024_1 |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.3333 |
0.4286 |
0.3810 |
0.6647 |
0.5011 |
0.5453 |
dmiip2024 |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.3333 |
0.4286 |
0.3810 |
0.6603 |
0.4967 |
0.5407 |
IISR 5th submit |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.2857 |
0.2857 |
0.2857 |
0.5466 |
0.4701 |
0.4915 |
RAG for medicine |
0.7200 |
0.7200 |
0.7200 |
0.7200 |
0.1905 |
0.3333 |
0.2397 |
0.5524 |
0.5209 |
0.5054 |
IBE-LM ver1 |
0.6000 |
0.7500 |
- |
0.3750 |
0.3333 |
0.4762 |
0.3825 |
- | - | - |
IBE-LM ver3 |
0.6000 |
0.7500 |
- |
0.3750 |
0.4286 |
0.4762 |
0.4444 |
- | - | - |
IBE-LM ver 5 |
0.6000 |
0.7500 |
- |
0.3750 |
0.4286 |
0.4762 |
0.4444 |
- | - | - |
IBE-LM ver2 |
0.6000 |
0.7500 |
- |
0.3750 |
0.3333 |
0.4762 |
0.3849 |
0.1143 |
0.1706 |
0.1280 |
IBE-LM ver4 |
0.6000 |
0.7500 |
- |
0.3750 |
0.3333 |
0.4762 |
0.3849 |
0.1143 |
0.1706 |
0.1280 |
IISR 2nd submit |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.2857 |
0.2857 |
0.2857 |
0.5813 |
0.4591 |
0.4931 |
IISR 3rd submit |
0.9200 |
0.9333 |
0.9000 |
0.9167 |
0.2381 |
0.2381 |
0.2381 |
0.5461 |
0.4721 |
0.4960 |
IISR 4th submit |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1905 |
0.1905 |
0.1905 |
0.5449 |
0.4682 |
0.4934 |
IISR first submit |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.2381 |
0.2857 |
0.2540 |
0.6317 |
0.4685 |
0.5161 |
CPS |
0.7200 |
0.7742 |
0.6316 |
0.7029 |
0.2857 |
0.2857 |
0.2857 |
0.3532 |
0.2286 |
0.2579 |
lasige-ku |
0.6800 |
0.7895 |
0.3333 |
0.5614 |
- | - | - |
- | - | - |
extractive |
0.8400 |
0.8824 |
0.7500 |
0.8162 |
0.1429 |
0.1905 |
0.1667 |
0.1996 |
0.2011 |
0.1908 |
AUEB-System1 |
0.8000 |
0.8571 |
0.6667 |
0.7619 |
0.3333 |
0.4286 |
0.3690 |
0.4286 |
0.2988 |
0.3211 |
BioASQ_Baseline |
0.4400 |
0.3000 |
0.5333 |
0.4167 |
0.0476 |
0.1905 |
0.0968 |
0.2366 |
0.2599 |
0.2100 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
mibi_rag_abstract |
0.3732 |
0.2670 |
0.3756 |
0.2594 |
4.44 |
4.79 |
4.27 |
4.42 |
mibi_rag_snippet |
0.4135 |
0.3016 |
0.4068 |
0.2882 |
4.54 |
4.75 |
4.41 |
4.49 |
UR-IW-1 |
0.3564 |
0.2745 |
0.3508 |
0.2623 |
4.41 |
4.60 |
4.26 |
4.36 |
UR-IW-4 |
0.4141 |
0.3122 |
0.4068 |
0.2960 |
- |
- |
- |
- |
UR-IW-2 |
0.3327 |
0.3651 |
0.3220 |
0.3535 |
- |
- |
- |
- |
UR-IW-5 |
0.3998 |
0.2539 |
0.4022 |
0.2458 |
- |
- |
- |
- |
UR-IW-3 |
0.3085 |
0.3244 |
0.3029 |
0.3167 |
- |
- |
- |
- |
Gatech competition |
0.2624 |
0.2479 |
0.2580 |
0.2399 |
4.35 |
4.44 |
4.20 |
4.62 |
Mistral-7B finetune |
0.3102 |
0.3236 |
0.3052 |
0.3149 |
4.35 |
4.35 |
4.22 |
4.47 |
Synthia with first |
0.3065 |
0.2778 |
0.3022 |
0.2683 |
4.49 |
4.48 |
4.33 |
4.66 |
LLM4SciLit |
0.0571 |
0.0776 |
0.0502 |
0.0695 |
3.07 |
3.08 |
3.41 |
3.91 |
RMC_append_snippets |
0.3978 |
0.3360 |
0.3901 |
0.3215 |
4.51 |
4.66 |
4.42 |
4.59 |
bioinfo-0 |
0.4060 |
0.1418 |
0.4170 |
0.1384 |
4.14 |
4.69 |
3.75 |
4.14 |
bioinfo-1 |
0.4038 |
0.1309 |
0.4188 |
0.1292 |
3.99 |
4.65 |
3.71 |
3.94 |
bioinfo-2 |
0.4636 |
0.1614 |
0.4761 |
0.1566 |
4.00 |
4.76 |
3.72 |
4.01 |
bioinfo-3 |
0.2991 |
0.1401 |
0.3104 |
0.1400 |
4.19 |
4.42 |
3.78 |
4.28 |
bioinfo-4 |
0.4208 |
0.1507 |
0.4336 |
0.1480 |
4.08 |
4.64 |
3.74 |
3.99 |
Fleming-1 |
0.3241 |
0.1686 |
0.3458 |
0.1681 |
4.28 |
4.68 |
3.98 |
4.34 |
dmiip2024_2 |
0.3139 |
0.3123 |
0.3134 |
0.3004 |
4.38 |
4.60 |
4.38 |
4.53 |
dmiip2024_3 |
0.2426 |
0.2628 |
0.2393 |
0.2531 |
4.48 |
4.55 |
4.40 |
4.65 |
dmiip2024_4 |
0.2834 |
0.3004 |
0.2700 |
0.2849 |
4.18 |
4.21 |
4.18 |
4.38 |
dmiip2024_1 |
0.2909 |
0.3237 |
0.2854 |
0.3134 |
4.39 |
4.41 |
4.38 |
4.46 |
dmiip2024 |
0.2879 |
0.3150 |
0.2787 |
0.3000 |
4.36 |
4.35 |
4.25 |
4.36 |
IISR 5th submit |
0.4235 |
0.2063 |
0.4238 |
0.1990 |
4.26 |
4.71 |
3.92 |
4.34 |
RAG for medicine |
0.3676 |
0.2008 |
0.3722 |
0.1955 |
4.27 |
4.75 |
4.06 |
4.29 |
IBE-LM ver1 |
- |
- |
- |
- |
0.89 |
0.75 |
0.98 |
1.16 |
IBE-LM ver3 |
- |
- |
- |
- |
0.89 |
0.75 |
0.98 |
1.16 |
IBE-LM ver 5 |
- |
- |
- |
- |
0.89 |
0.75 |
0.98 |
1.16 |
IBE-LM ver2 |
- |
- |
- |
- |
0.89 |
0.75 |
0.98 |
1.16 |
IBE-LM ver4 |
- |
- |
- |
- |
0.89 |
0.75 |
0.98 |
1.16 |
IISR 2nd submit |
0.3274 |
0.2889 |
0.3238 |
0.2781 |
4.56 |
4.68 |
4.56 |
4.59 |
IISR 3rd submit |
0.4361 |
0.2311 |
0.4350 |
0.2207 |
4.34 |
4.71 |
4.00 |
4.36 |
IISR 4th submit |
0.4259 |
0.1991 |
0.4210 |
0.1898 |
4.27 |
4.75 |
3.93 |
4.25 |
IISR first submit |
0.3315 |
0.3111 |
0.3256 |
0.2997 |
4.48 |
4.56 |
4.42 |
4.54 |
CPS |
0.3234 |
0.2860 |
0.3173 |
0.2773 |
4.38 |
4.08 |
3.98 |
4.53 |
lasige-ku |
0.0532 |
0.0469 |
0.0788 |
0.0640 |
3.00 |
2.58 |
2.51 |
3.98 |
extractive |
0.2447 |
0.2549 |
0.2447 |
0.2548 |
4.13 |
3.87 |
3.91 |
4.31 |
AUEB-System1 |
- |
- |
- |
- |
- |
- |
- |
- |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |