BioASQ Participants Area
Task 11b: Test Results of Phase B
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.The evaluation measures that are used in Task B are presented here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| OWLMan-phaseB-TaskV1 | 0.6250 | 0.7273 | 0.4000 | 0.5636 | 0.3684 | 0.5789 | 0.4649 | 0.2333 | 0.4152 | 0.2789 |
| IISR-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4211 | 0.4211 | 0.4211 | 0.7602 | 0.6773 | 0.7043 |
| IISR-3 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| Fleming-4 | 0.7500 | 0.8333 | 0.5000 | 0.6667 | - | - | - | - | - | - |
| AsqAway_1 | 0.6250 | 0.7692 | - | 0.3846 | 0.3684 | 0.3684 | 0.3684 | 0.4873 | 0.5083 | 0.4723 |
| AsqAway_2 | 0.6250 | 0.7692 | - | 0.3846 | 0.3684 | 0.5263 | 0.4474 | 0.4535 | 0.6332 | 0.4983 |
| AsqAway_3 | 0.6250 | 0.7692 | - | 0.3846 | 0.3684 | 0.5263 | 0.4474 | 0.4480 | 0.7131 | 0.5225 |
| AsqAway_4 | 0.6250 | 0.7692 | - | 0.3846 | 0.4211 | 0.5263 | 0.4737 | 0.4480 | 0.7131 | 0.5225 |
| MQ-1 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| MQ-2 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| MQ-3 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| Deep ML methods for | 0.6250 | 0.7273 | 0.4000 | 0.5636 | 0.2105 | 0.2632 | 0.2281 | 0.4444 | 0.2680 | 0.2880 |
| MQ-4 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| MQ-5 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| simple truncation | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2632 | 0.2632 | 0.2632 | 0.2806 | 0.2521 | 0.2601 |
| dmiip1 | 0.8333 | 0.8750 | 0.7500 | 0.8125 | 0.4211 | 0.5789 | 0.5000 | 0.5016 | 0.4554 | 0.4347 |
| dmiip2 | 0.8333 | 0.8750 | 0.7500 | 0.8125 | 0.5789 | 0.6842 | 0.6096 | 0.5532 | 0.4422 | 0.4462 |
| dmiip4 | 0.8750 | 0.9032 | 0.8235 | 0.8634 | 0.4211 | 0.6842 | 0.5263 | 0.5549 | 0.5429 | 0.4932 |
| dmiip5 | 0.8333 | 0.8750 | 0.7500 | 0.8125 | 0.4211 | 0.6316 | 0.5000 | 0.5232 | 0.5214 | 0.4809 |
| dmiip3 | 0.8750 | 0.9032 | 0.8235 | 0.8634 | 0.4211 | 0.6316 | 0.5000 | 0.4590 | 0.5324 | 0.4701 |
| IISR-2 | 0.9583 | 0.9677 | 0.9412 | 0.9545 | 0.5789 | 0.5789 | 0.5789 | 0.6833 | 0.5864 | 0.6210 |
| UR-IW-2 | 0.9167 | 0.9375 | 0.8750 | 0.9063 | 0.6316 | 0.6316 | 0.6316 | 0.6750 | 0.6697 | 0.6703 |
| UR-IW-3 | 0.9583 | 0.9677 | 0.9412 | 0.9545 | 0.5263 | 0.6316 | 0.5789 | 0.7159 | 0.7356 | 0.7130 |
| UR-IW-4 | 0.8333 | 0.8750 | 0.7500 | 0.8125 | 0.2105 | 0.3158 | 0.2632 | 0.4278 | 0.4164 | 0.4143 |
| extractive | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2632 | 0.2632 | 0.2632 | 0.2083 | 0.1833 | 0.1905 |
| abstractive | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2105 | 0.2105 | 0.2105 | 0.2083 | 0.2000 | 0.2037 |
| bioinfo-0 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| BART | 0.3333 | 0.3333 | 0.3333 | 0.3333 | - | - | - | - | - | - |
| DMIS-KU-2 | 0.9167 | 0.9375 | 0.8750 | 0.9063 | 0.5263 | 0.6842 | 0.6053 | 0.7782 | 0.7032 | 0.7158 |
| DMIS-KU-3 | 0.9167 | 0.9375 | 0.8750 | 0.9063 | 0.4737 | 0.6842 | 0.5614 | 0.8278 | 0.6790 | 0.7219 |
| DMIS-KU-5 | 0.9583 | 0.9677 | 0.9412 | 0.9545 | 0.5263 | 0.6842 | 0.6053 | 0.8278 | 0.6790 | 0.7219 |
| DMIS-KU-1 | 0.9167 | 0.9375 | 0.8750 | 0.9063 | 0.5263 | 0.6842 | 0.6053 | 0.8611 | 0.6714 | 0.7169 |
| DMIS-KU-4 | 0.9583 | 0.9677 | 0.9412 | 0.9545 | 0.5263 | 0.6842 | 0.5965 | 0.8819 | 0.6714 | 0.7224 |
| BioASQ Baseline ZS | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| BioASQ Baseline FS | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| BioASQ_Baseline | 0.3750 | 0.2857 | 0.4444 | 0.3651 | 0.0526 | 0.3684 | 0.1465 | 0.2855 | 0.4948 | 0.3201 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| OWLMan-phaseB-TaskV1 | - | - | - | - | - | - | - | - |
| IISR-1 | 0.4042 | 0.3691 | 0.3837 | 0.3439 | 4.72 | 4.73 | 4.55 | 4.79 |
| IISR-3 | 0.4249 | 0.4037 | 0.4138 | 0.3930 | 4.57 | 4.33 | 4.32 | 4.85 |
| Fleming-4 | - | - | - | - | 1.21 | 0.92 | 1.05 | 1.28 |
| AsqAway_1 | - | - | - | - | - | - | - | - |
| AsqAway_2 | - | - | - | - | - | - | - | - |
| AsqAway_3 | - | - | - | - | - | - | - | - |
| AsqAway_4 | - | - | - | - | - | - | - | - |
| MQ-1 | 0.2592 | 0.1951 | 0.2680 | 0.1896 | 4.51 | 3.49 | 3.39 | 4.68 |
| MQ-2 | 0.2724 | 0.2470 | 0.2747 | 0.2398 | 4.63 | 3.61 | 3.55 | 4.83 |
| MQ-3 | 0.5345 | 0.3980 | 0.5182 | 0.3778 | 4.69 | 4.81 | 4.31 | 4.73 |
| Deep ML methods for | 0.0930 | 0.1385 | 0.0686 | 0.1036 | 3.41 | 3.01 | 3.55 | 4.55 |
| MQ-4 | 0.5451 | 0.3310 | 0.5316 | 0.3151 | 4.33 | 4.52 | 3.77 | 4.20 |
| MQ-5 | 0.5537 | 0.3350 | 0.5387 | 0.3180 | 4.33 | 4.52 | 3.76 | 4.13 |
| simple truncation | 0.1504 | 0.0670 | 0.1462 | 0.0631 | 1.07 | 1.24 | 0.88 | 1.07 |
| dmiip1 | 0.5168 | 0.3032 | 0.5115 | 0.2917 | 4.37 | 4.56 | 3.85 | 4.25 |
| dmiip2 | 0.5228 | 0.3137 | 0.5133 | 0.3009 | 4.35 | 4.52 | 3.76 | 4.27 |
| dmiip4 | 0.5359 | 0.3193 | 0.5263 | 0.3060 | 4.33 | 4.52 | 3.79 | 4.25 |
| dmiip5 | 0.5395 | 0.3309 | 0.5275 | 0.3162 | 4.36 | 4.49 | 3.84 | 4.28 |
| dmiip3 | 0.5228 | 0.3137 | 0.5133 | 0.3009 | 4.35 | 4.52 | 3.76 | 4.27 |
| IISR-2 | 0.4148 | 0.3653 | 0.3995 | 0.3450 | 4.76 | 4.80 | 4.65 | 4.80 |
| UR-IW-2 | 0.5630 | 0.2136 | 0.5521 | 0.1990 | 4.51 | 4.77 | 3.37 | 4.29 |
| UR-IW-3 | 0.5245 | 0.1762 | 0.5209 | 0.1663 | 4.51 | 4.79 | 3.28 | 4.25 |
| UR-IW-4 | 0.3531 | 0.0999 | 0.3796 | 0.1015 | 4.35 | 4.35 | 2.97 | 4.20 |
| extractive | 0.1606 | 0.0700 | 0.1570 | 0.0667 | 1.09 | 1.24 | 0.87 | 1.16 |
| abstractive | 0.1592 | 0.0702 | 0.1544 | 0.0665 | 1.12 | 1.20 | 0.87 | 1.16 |
| bioinfo-0 | 0.3147 | 0.2979 | 0.3036 | 0.2788 | 4.69 | 4.53 | 4.39 | 4.71 |
| BART | 0.1671 | 0.2001 | 0.1553 | 0.1881 | 4.32 | 2.52 | 2.93 | 4.76 |
| DMIS-KU-2 | - | - | - | - | - | - | - | - |
| DMIS-KU-3 | - | - | - | - | - | - | - | - |
| DMIS-KU-5 | - | - | - | - | - | - | - | - |
| DMIS-KU-1 | - | - | - | - | - | - | - | - |
| DMIS-KU-4 | - | - | - | - | - | - | - | - |
| BioASQ Baseline ZS | 0.1727 | 0.0977 | 0.1936 | 0.1004 | 3.19 | 2.64 | 2.41 | 3.88 |
| BioASQ Baseline FS | 0.3048 | 0.2493 | 0.3026 | 0.2443 | 3.85 | 3.99 | 3.75 | 4.55 |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |
Test batch 2
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| IISR-2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.6364 | 0.7273 | 0.6818 | 0.5308 | 0.3545 | 0.4022 |
| IISR-3 | 0.5833 | 0.7368 | - | 0.3684 | 0.6364 | 0.7273 | 0.6818 | 0.5308 | 0.3545 | 0.4022 |
| IISR-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.5455 | 0.6818 | 0.6136 | 0.5139 | 0.3319 | 0.3757 |
| OWLMan-phaseB-TaskV1 | 0.4583 | 0.5806 | 0.2353 | 0.4080 | 0.4545 | 0.5455 | 0.4788 | 0.1833 | 0.1520 | 0.1522 |
| UR-IW-3 | 0.9167 | 0.9333 | 0.8889 | 0.9111 | 0.5909 | 0.6818 | 0.6364 | 0.4973 | 0.4865 | 0.4611 |
| UR-IW-2 | 0.9583 | 0.9655 | 0.9474 | 0.9564 | 0.6364 | 0.7273 | 0.6818 | 0.3973 | 0.4388 | 0.3967 |
| Fleming-4 | 0.7083 | 0.8000 | 0.4615 | 0.6308 | - | - | - | - | - | - |
| capstone-1 | 0.9167 | 0.9231 | 0.9091 | 0.9161 | 0.4091 | 0.6364 | 0.4939 | 0.2293 | 0.3680 | 0.2725 |
| capstone-3 | 0.8333 | 0.8571 | 0.8000 | 0.8286 | 0.2727 | 0.3636 | 0.2955 | 0.2728 | 0.3680 | 0.2999 |
| capstone-2 | 0.8333 | 0.8571 | 0.8000 | 0.8286 | 0.2727 | 0.3636 | 0.2955 | 0.3458 | 0.3528 | 0.3107 |
| bioinfo-0 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| bioinfo-1 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| bioinfo-2 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| bioinfo-3 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| bioinfo-4 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| AsqAway_1 | 0.8750 | 0.8966 | 0.8421 | 0.8693 | 0.4545 | 0.5000 | 0.4659 | 0.2018 | 0.2228 | 0.1999 |
| AsqAway_2 | 0.8750 | 0.8966 | 0.8421 | 0.8693 | 0.5000 | 0.5000 | 0.5000 | 0.2211 | 0.3171 | 0.2444 |
| AsqAway_3 | 0.8750 | 0.8966 | 0.8421 | 0.8693 | 0.5000 | 0.5455 | 0.5227 | 0.2096 | 0.3448 | 0.2444 |
| AsqAway_4 | 0.8750 | 0.8966 | 0.8421 | 0.8693 | 0.5000 | 0.5455 | 0.5227 | 0.2096 | 0.3448 | 0.2444 |
| Deep ML methods for | 0.8333 | 0.8667 | 0.7778 | 0.8222 | 0.3182 | 0.3182 | 0.3182 | 0.3472 | 0.1860 | 0.1986 |
| MindLab QA Reloaded | 0.7917 | 0.8276 | 0.7368 | 0.7822 | 0.3182 | 0.3182 | 0.3182 | 0.3264 | 0.1193 | 0.1616 |
| MindLab QA System | 0.8333 | 0.8667 | 0.7778 | 0.8222 | 0.3182 | 0.4545 | 0.3455 | 0.1500 | 0.2008 | 0.1568 |
| MindLab QA System ++ | 0.5417 | 0.6857 | 0.1538 | 0.4198 | 0.1818 | 0.2273 | 0.2045 | 0.4306 | 0.1332 | 0.1875 |
| MindLab Red Lions++ | 0.7917 | 0.8276 | 0.7368 | 0.7822 | 0.3182 | 0.3182 | 0.3182 | 0.3264 | 0.1193 | 0.1616 |
| MQ-1 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| MQ-2 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| MQ-3 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| MQ-4 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| MQ-5 | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| DMIS-KU-1 | 0.9583 | 0.9630 | 0.9524 | 0.9577 | 0.4091 | 0.7273 | 0.5530 | 0.3468 | 0.3632 | 0.3136 |
| DMIS-KU-2 | 0.9583 | 0.9630 | 0.9524 | 0.9577 | 0.4091 | 0.7727 | 0.5523 | 0.3625 | 0.3466 | 0.3139 |
| DMIS-KU-3 | 0.9583 | 0.9630 | 0.9524 | 0.9577 | 0.4545 | 0.6818 | 0.5568 | 0.3258 | 0.4210 | 0.3485 |
| DMIS-KU-4 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4545 | 0.6818 | 0.5455 | 0.3088 | 0.3692 | 0.2921 |
| DMIS-KU-5 | 0.9167 | 0.9286 | 0.9000 | 0.9143 | 0.4545 | 0.6818 | 0.5492 | 0.2707 | 0.4496 | 0.3121 |
| dmiip2 | 0.5833 | 0.7368 | - | 0.3684 | 0.4091 | 0.6818 | 0.5189 | 0.3774 | 0.3316 | 0.3188 |
| dmiip3 | 0.8750 | 0.8889 | 0.8571 | 0.8730 | 0.3636 | 0.5909 | 0.4447 | 0.3074 | 0.2566 | 0.2408 |
| dmiip5 | 0.9583 | 0.9655 | 0.9474 | 0.9564 | 0.4091 | 0.4545 | 0.4242 | 0.1579 | 0.2187 | 0.1783 |
| dmiip4 | 0.7917 | 0.8276 | 0.7368 | 0.7822 | 0.4545 | 0.6818 | 0.5568 | 0.2816 | 0.3335 | 0.2740 |
| dmiip1 | 0.9167 | 0.9286 | 0.9000 | 0.9143 | 0.3182 | 0.6818 | 0.4659 | 0.2439 | 0.3820 | 0.2643 |
| simple truncation | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| extractive | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| abstractive | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| BioASQ_Baseline | 0.5000 | 0.3333 | 0.6000 | 0.4667 | 0.0909 | 0.1364 | 0.1136 | 0.1237 | 0.2909 | 0.1674 |
| BioASQ Baseline ZS | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
| BioASQ Baseline FS | 0.5833 | 0.7368 | - | 0.3684 | - | - | - | - | - | - |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| IISR-2 | 0.3722 | 0.3500 | 0.3599 | 0.3335 | 4.87 | 4.63 | 4.80 | 4.95 |
| IISR-3 | 0.3176 | 0.3098 | 0.3192 | 0.3048 | 4.68 | 4.31 | 4.55 | 4.96 |
| IISR-1 | 0.3662 | 0.3421 | 0.3546 | 0.3269 | 4.81 | 4.59 | 4.69 | 4.91 |
| OWLMan-phaseB-TaskV1 | - | - | - | - | - | - | - | - |
| UR-IW-3 | 0.5181 | 0.2117 | 0.5251 | 0.2024 | 4.49 | 4.81 | 3.31 | 4.21 |
| UR-IW-2 | 0.5254 | 0.2316 | 0.5216 | 0.2180 | 4.51 | 4.89 | 3.47 | 4.23 |
| Fleming-4 | - | - | - | - | 1.08 | 1.03 | 1.33 | 1.55 |
| capstone-1 | 0.4674 | 0.3288 | 0.4680 | 0.3227 | 4.37 | 4.52 | 3.91 | 4.51 |
| capstone-3 | 0.4674 | 0.3288 | 0.4680 | 0.3227 | 4.37 | 4.52 | 3.91 | 4.51 |
| capstone-2 | 0.4674 | 0.3288 | 0.4680 | 0.3227 | 4.37 | 4.52 | 3.91 | 4.51 |
| bioinfo-0 | 0.2128 | 0.2034 | 0.2107 | 0.1959 | 4.23 | 4.01 | 3.97 | 4.65 |
| bioinfo-1 | 0.2648 | 0.2526 | 0.2578 | 0.2419 | 4.43 | 4.16 | 4.19 | 4.64 |
| bioinfo-2 | 0.2805 | 0.2942 | 0.2761 | 0.2875 | 4.59 | 4.24 | 4.61 | 4.91 |
| bioinfo-3 | 0.3267 | 0.3376 | 0.3138 | 0.3230 | 4.61 | 4.40 | 4.79 | 4.93 |
| bioinfo-4 | 0.3186 | 0.3268 | 0.3074 | 0.3137 | 4.57 | 4.36 | 4.75 | 4.92 |
| AsqAway_1 | - | - | - | - | - | - | - | - |
| AsqAway_2 | - | - | - | - | - | - | - | - |
| AsqAway_3 | - | - | - | - | - | - | - | - |
| AsqAway_4 | - | - | - | - | - | - | - | - |
| Deep ML methods for | 0.0491 | 0.0765 | 0.0419 | 0.0626 | 2.87 | 2.52 | 3.31 | 4.31 |
| MindLab QA Reloaded | 0.0524 | 0.0798 | 0.0449 | 0.0659 | 3.00 | 2.67 | 3.45 | 4.36 |
| MindLab QA System | 0.0229 | 0.0360 | 0.0278 | 0.0384 | 3.05 | 2.63 | 3.40 | 4.25 |
| MindLab QA System ++ | 0.0659 | 0.1014 | 0.0498 | 0.0793 | 2.93 | 2.68 | 3.39 | 4.36 |
| MindLab Red Lions++ | 0.0524 | 0.0798 | 0.0449 | 0.0659 | 3.00 | 2.67 | 3.45 | 4.36 |
| MQ-1 | 0.2268 | 0.1891 | 0.2386 | 0.1912 | 4.75 | 3.77 | 3.99 | 4.84 |
| MQ-2 | 0.2710 | 0.2430 | 0.2620 | 0.2328 | 4.93 | 4.05 | 4.29 | 4.92 |
| MQ-3 | 0.4956 | 0.3714 | 0.4832 | 0.3556 | 4.51 | 4.72 | 4.29 | 4.64 |
| MQ-4 | 0.4674 | 0.3288 | 0.4680 | 0.3227 | 4.37 | 4.52 | 3.89 | 4.52 |
| MQ-5 | 0.4674 | 0.3285 | 0.4684 | 0.3228 | 4.37 | 4.52 | 3.89 | 4.52 |
| DMIS-KU-1 | - | - | - | - | - | - | - | - |
| DMIS-KU-2 | - | - | - | - | - | - | - | - |
| DMIS-KU-3 | - | - | - | - | - | - | - | - |
| DMIS-KU-4 | - | - | - | - | - | - | - | - |
| DMIS-KU-5 | - | - | - | - | - | - | - | - |
| dmiip2 | 0.4642 | 0.3294 | 0.4675 | 0.3238 | 4.35 | 4.59 | 3.88 | 4.47 |
| dmiip3 | 0.4642 | 0.3294 | 0.4675 | 0.3238 | 4.35 | 4.59 | 3.88 | 4.47 |
| dmiip5 | 0.4155 | 0.3719 | 0.3966 | 0.3487 | 4.77 | 4.73 | 4.56 | 4.84 |
| dmiip4 | 0.4700 | 0.3271 | 0.4710 | 0.3207 | 4.35 | 4.51 | 3.91 | 4.49 |
| dmiip1 | 0.4098 | 0.2959 | 0.4219 | 0.2963 | 4.32 | 4.33 | 3.81 | 4.44 |
| simple truncation | 0.1358 | 0.0850 | 0.1342 | 0.0828 | 0.97 | 1.05 | 0.88 | 0.99 |
| extractive | 0.1277 | 0.0813 | 0.1265 | 0.0792 | 1.03 | 1.05 | 0.85 | 1.04 |
| abstractive | 0.1306 | 0.0810 | 0.1311 | 0.0790 | 0.97 | 1.00 | 0.79 | 0.99 |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |
| BioASQ Baseline ZS | 0.1728 | 0.1106 | 0.1979 | 0.1193 | 2.87 | 2.44 | 2.13 | 3.36 |
| BioASQ Baseline FS | 0.2645 | 0.2425 | 0.2664 | 0.2395 | 3.33 | 3.53 | 3.56 | 4.25 |
Test batch 3
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| UR-IW-5 | 0.7917 | 0.8485 | 0.6667 | 0.7576 | 0.4231 | 0.4231 | 0.4231 | 0.3421 | 0.2374 | 0.2565 |
| UR-IW-4 | 0.8750 | 0.9032 | 0.8235 | 0.8634 | 0.3077 | 0.5000 | 0.4038 | 0.3546 | 0.2800 | 0.2939 |
| UR-IW-3 | 0.8750 | 0.9091 | 0.8000 | 0.8545 | 0.5385 | 0.6154 | 0.5705 | 0.5600 | 0.4376 | 0.4693 |
| UR-IW-2 | 0.9167 | 0.9375 | 0.8750 | 0.9063 | 0.5000 | 0.5385 | 0.5192 | 0.5518 | 0.6010 | 0.5441 |
| bioinfo-0 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| bioinfo-2 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| bioinfo-3 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| MQ-1 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| MQ-2 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| MQ-3 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| MQ-4 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| MQ-5 | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| Deep ML methods for | 0.7917 | 0.8485 | 0.6667 | 0.7576 | 0.3077 | 0.3077 | 0.3077 | 0.2778 | 0.0922 | 0.1274 |
| capstone-1 | 0.7083 | 0.7742 | 0.5882 | 0.6812 | 0.4231 | 0.6923 | 0.5417 | 0.5243 | 0.3608 | 0.3783 |
| capstone-2 | 0.7500 | 0.8000 | 0.6667 | 0.7333 | 0.4615 | 0.6923 | 0.5590 | 0.4606 | 0.4225 | 0.3984 |
| capstone-3 | 0.9167 | 0.9333 | 0.8889 | 0.9111 | 0.4231 | 0.4615 | 0.4423 | 0.2444 | 0.1848 | 0.1985 |
| capstone-4 | 0.9167 | 0.9333 | 0.8889 | 0.9111 | 0.3077 | 0.4615 | 0.3622 | 0.2222 | 0.2033 | 0.1943 |
| capstone-5 | 0.7083 | 0.7742 | 0.5882 | 0.6812 | - | - | - | 0.5000 | 0.1127 | 0.1724 |
| MindLab QA Reloaded | 0.5417 | 0.5600 | 0.5217 | 0.5409 | 0.1923 | 0.1923 | 0.1923 | 0.0833 | 0.0278 | 0.0417 |
| IISR-1 | 0.9167 | 0.9375 | 0.8750 | 0.9063 | 0.4231 | 0.4231 | 0.4231 | 0.5292 | 0.4141 | 0.4420 |
| IISR-2 | 0.9167 | 0.9333 | 0.8889 | 0.9111 | 0.4231 | 0.4615 | 0.4423 | 0.6796 | 0.5395 | 0.5608 |
| IISR-3 | 0.6250 | 0.7692 | - | 0.3846 | 0.4231 | 0.4615 | 0.4423 | 0.6796 | 0.5395 | 0.5608 |
| AsqAway_1 | 0.8750 | 0.9091 | 0.8000 | 0.8545 | 0.3077 | 0.4231 | 0.3474 | 0.4101 | 0.4395 | 0.4071 |
| MindLab QA System | 0.5833 | 0.5833 | 0.5833 | 0.5833 | 0.0769 | 0.1538 | 0.1090 | 0.3361 | 0.1833 | 0.2242 |
| AsqAway_2 | 0.8750 | 0.9091 | 0.8000 | 0.8545 | 0.4615 | 0.5000 | 0.4808 | 0.4565 | 0.5132 | 0.4560 |
| MindLab Red Lions++ | 0.5417 | 0.5600 | 0.5217 | 0.5409 | 0.1923 | 0.1923 | 0.1923 | 0.0833 | 0.0278 | 0.0417 |
| AsqAway_3 | 0.8750 | 0.9091 | 0.8000 | 0.8545 | 0.4615 | 0.5385 | 0.5000 | 0.3764 | 0.5595 | 0.4226 |
| AsqAway_4 | 0.8750 | 0.9091 | 0.8000 | 0.8545 | 0.4615 | 0.5385 | 0.5000 | 0.3764 | 0.5595 | 0.4226 |
| kmeans | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| extractive | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| similarity measures | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| abstractive | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| simple truncation | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| dmiip1 | 0.8333 | 0.8824 | 0.7143 | 0.7983 | 0.5000 | 0.7308 | 0.5667 | 0.2932 | 0.4897 | 0.3278 |
| dmiip2 | 0.8333 | 0.8667 | 0.7778 | 0.8222 | 0.4231 | 0.5385 | 0.4647 | 0.3363 | 0.4453 | 0.3396 |
| dmiip3 | 0.7083 | 0.8000 | 0.4615 | 0.6308 | 0.4231 | 0.6923 | 0.5397 | 0.3951 | 0.5472 | 0.4126 |
| dmiip4 | 0.7500 | 0.8235 | 0.5714 | 0.6975 | 0.4615 | 0.7692 | 0.5814 | 0.3840 | 0.4441 | 0.3620 |
| dmiip5 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4615 | 0.6923 | 0.5481 | 0.3167 | 0.4862 | 0.3307 |
| Fleming-4 | 0.7500 | 0.8333 | 0.5000 | 0.6667 | - | - | - | - | - | - |
| ELErank | 0.9583 | 0.9677 | 0.9412 | 0.9545 | 0.5000 | 0.5769 | 0.5288 | - | - | - |
| DMIS-KU-1 | 0.9583 | 0.9677 | 0.9412 | 0.9545 | 0.5000 | 0.6538 | 0.5538 | 0.5458 | 0.4689 | 0.4916 |
| DMIS-KU-2 | 0.8750 | 0.8966 | 0.8421 | 0.8693 | 0.4615 | 0.6538 | 0.5462 | 0.5258 | 0.4991 | 0.5025 |
| DMIS-KU-3 | 0.8750 | 0.9032 | 0.8235 | 0.8634 | 0.4615 | 0.6538 | 0.5365 | 0.5529 | 0.4760 | 0.5012 |
| DMIS-KU-4 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4231 | 0.5385 | 0.4596 | 0.5680 | 0.4689 | 0.4975 |
| DMIS-KU-5 | 0.9167 | 0.9333 | 0.8889 | 0.9111 | 0.3846 | 0.6538 | 0.5032 | 0.5322 | 0.5257 | 0.5156 |
| DICE1 | 0.6667 | 0.6923 | 0.6364 | 0.6643 | 0.3846 | 0.3846 | 0.3846 | 0.4148 | 0.3221 | 0.3307 |
| DICE2 | 0.6250 | 0.6897 | 0.5263 | 0.6080 | 0.3077 | 0.3077 | 0.3077 | 0.3339 | 0.3072 | 0.2955 |
| BioASQ_Baseline | 0.4167 | 0.3000 | 0.5000 | 0.4000 | 0.0385 | 0.3077 | 0.1487 | 0.1726 | 0.2450 | 0.1599 |
| BioASQ Baseline ZS | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
| BioASQ Baseline FS | 0.6250 | 0.7692 | - | 0.3846 | - | - | - | - | - | - |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| UR-IW-5 | 0.3083 | 0.1074 | 0.3363 | 0.1116 | 4.64 | 4.22 | 2.88 | 4.47 |
| UR-IW-4 | 0.3209 | 0.1123 | 0.3541 | 0.1176 | 4.61 | 4.53 | 2.97 | 4.41 |
| UR-IW-3 | 0.4835 | 0.1869 | 0.4842 | 0.1781 | 4.62 | 4.97 | 3.22 | 4.36 |
| UR-IW-2 | 0.5068 | 0.2151 | 0.4921 | 0.2004 | 4.72 | 4.97 | 3.30 | 4.50 |
| bioinfo-0 | 0.1803 | 0.1401 | 0.1911 | 0.1445 | 4.59 | 4.03 | 3.86 | 4.72 |
| bioinfo-2 | 0.1755 | 0.1328 | 0.1880 | 0.1346 | 4.53 | 4.08 | 4.02 | 4.68 |
| bioinfo-3 | 0.1517 | 0.1058 | 0.1684 | 0.1121 | 4.42 | 3.78 | 3.52 | 4.47 |
| MQ-1 | 0.2286 | 0.2088 | 0.2284 | 0.2056 | 4.73 | 3.67 | 3.86 | 4.88 |
| MQ-2 | 0.2444 | 0.2321 | 0.2430 | 0.2264 | 4.84 | 3.82 | 4.02 | 4.92 |
| MQ-3 | 0.4362 | 0.3636 | 0.4327 | 0.3507 | 4.89 | 4.86 | 4.53 | 4.88 |
| MQ-4 | 0.4386 | 0.2807 | 0.4382 | 0.2703 | 4.32 | 4.58 | 3.97 | 4.36 |
| MQ-5 | 0.4336 | 0.2778 | 0.4333 | 0.2675 | 4.34 | 4.57 | 3.98 | 4.31 |
| Deep ML methods for | 0.0601 | 0.0957 | 0.0423 | 0.0693 | 2.86 | 2.82 | 3.12 | 4.07 |
| capstone-1 | 0.4386 | 0.2807 | 0.4382 | 0.2703 | 4.31 | 4.58 | 3.98 | 4.33 |
| capstone-2 | 0.4386 | 0.2807 | 0.4382 | 0.2703 | 4.31 | 4.58 | 3.98 | 4.33 |
| capstone-3 | 0.4386 | 0.2807 | 0.4382 | 0.2703 | 4.31 | 4.58 | 3.98 | 4.33 |
| capstone-4 | 0.4386 | 0.2807 | 0.4382 | 0.2703 | 4.31 | 4.58 | 3.98 | 4.33 |
| capstone-5 | 0.4386 | 0.2807 | 0.4382 | 0.2703 | 4.31 | 4.58 | 3.98 | 4.33 |
| MindLab QA Reloaded | 0.0495 | 0.0767 | 0.0371 | 0.0598 | 2.90 | 2.58 | 3.07 | 4.16 |
| IISR-1 | 0.3391 | 0.3417 | 0.3302 | 0.3281 | 4.83 | 4.59 | 4.79 | 4.97 |
| IISR-2 | 0.3444 | 0.3313 | 0.3328 | 0.3175 | 4.91 | 4.69 | 4.77 | 4.98 |
| IISR-3 | 0.3168 | 0.2922 | 0.3102 | 0.2796 | 4.57 | 4.28 | 4.46 | 4.91 |
| AsqAway_1 | - | - | - | - | - | - | - | - |
| MindLab QA System | 0.0378 | 0.0571 | 0.0303 | 0.0480 | 3.00 | 2.51 | 3.00 | 4.16 |
| AsqAway_2 | - | - | - | - | - | - | - | - |
| MindLab Red Lions++ | 0.0495 | 0.0767 | 0.0371 | 0.0598 | 2.90 | 2.58 | 3.07 | 4.16 |
| AsqAway_3 | - | - | - | - | - | - | - | - |
| AsqAway_4 | - | - | - | - | - | - | - | - |
| kmeans | 0.1113 | 0.1034 | 0.1096 | 0.1002 | 1.16 | 1.07 | 1.07 | 1.21 |
| extractive | 0.1220 | 0.1156 | 0.1188 | 0.1121 | 1.13 | 1.17 | 1.16 | 1.21 |
| similarity measures | 0.1072 | 0.1024 | 0.1048 | 0.0988 | 1.16 | 1.08 | 1.09 | 1.21 |
| abstractive | 0.1179 | 0.1133 | 0.1170 | 0.1106 | 1.16 | 1.19 | 1.16 | 1.20 |
| simple truncation | 0.1235 | 0.1226 | 0.1193 | 0.1175 | 1.16 | 1.18 | 1.17 | 1.21 |
| dmiip1 | 0.3260 | 0.1364 | 0.3423 | 0.1389 | 4.21 | 4.06 | 3.14 | 3.83 |
| dmiip2 | 0.4281 | 0.2145 | 0.4313 | 0.2080 | 4.41 | 4.54 | 3.73 | 4.02 |
| dmiip3 | 0.4364 | 0.2171 | 0.4392 | 0.2103 | 4.41 | 4.56 | 3.71 | 4.01 |
| dmiip4 | 0.4286 | 0.2137 | 0.4307 | 0.2049 | 4.41 | 4.40 | 3.66 | 3.86 |
| dmiip5 | 0.3900 | 0.3525 | 0.3886 | 0.3469 | 4.82 | 4.66 | 4.67 | 4.86 |
| Fleming-4 | - | - | - | - | 0.86 | 0.78 | 0.93 | 1.07 |
| ELErank | 0.2361 | 0.2491 | 0.2301 | 0.2370 | 4.62 | 3.81 | 4.22 | 4.77 |
| DMIS-KU-1 | - | - | - | - | - | - | - | - |
| DMIS-KU-2 | - | - | - | - | - | - | - | - |
| DMIS-KU-3 | - | - | - | - | - | - | - | - |
| DMIS-KU-4 | - | - | - | - | - | - | - | - |
| DMIS-KU-5 | - | - | - | - | - | - | - | - |
| DICE1 | - | - | - | - | - | - | - | - |
| DICE2 | - | - | - | - | - | - | - | - |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |
| BioASQ Baseline ZS | 0.1669 | 0.1206 | 0.1952 | 0.1362 | 3.03 | 2.48 | 2.17 | 3.44 |
| BioASQ Baseline FS | 0.2192 | 0.1923 | 0.2241 | 0.1912 | 3.49 | 3.82 | 3.76 | 4.46 |
Test batch 4
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| UR-IW-5 | 0.4286 | 0.5000 | 0.3333 | 0.4167 | 0.3226 | 0.3871 | 0.3495 | 0.4421 | 0.3495 | 0.3734 |
| UR-IW-4 | 0.7857 | 0.7273 | 0.8235 | 0.7754 | 0.2258 | 0.2903 | 0.2473 | 0.4881 | 0.4489 | 0.4492 |
| UR-IW-3 | 0.9286 | 0.8571 | 0.9524 | 0.9048 | 0.6452 | 0.6774 | 0.6613 | 0.6167 | 0.6619 | 0.6211 |
| UR-IW-2 | 0.9286 | 0.8889 | 0.9474 | 0.9181 | 0.5484 | 0.6452 | 0.5968 | 0.6939 | 0.7516 | 0.7069 |
| IISR-2 | 0.9286 | 0.8889 | 0.9474 | 0.9181 | 0.4516 | 0.4839 | 0.4677 | 0.6456 | 0.6517 | 0.6316 |
| IISR-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3871 | 0.4194 | 0.4032 | 0.7213 | 0.6305 | 0.6618 |
| bioinfo-0 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| bioinfo-1 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| bioinfo-2 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| bioinfo-3 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| bioinfo-4 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| dmiip1 | 0.2857 | 0.4444 | - | 0.2222 | 0.4839 | 0.8065 | 0.6022 | 0.4871 | 0.4253 | 0.4368 |
| dmiip2 | 0.7857 | 0.6667 | 0.8421 | 0.7544 | 0.4839 | 0.7742 | 0.6032 | 0.5610 | 0.4919 | 0.4927 |
| dmiip3 | 0.7857 | 0.6667 | 0.8421 | 0.7544 | 0.5806 | 0.8387 | 0.6855 | 0.5597 | 0.4490 | 0.4717 |
| dmiip4 | 0.7143 | 0.6667 | 0.7500 | 0.7083 | 0.4194 | 0.6774 | 0.5226 | 0.5027 | 0.4384 | 0.4336 |
| dmiip5 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.5161 | 0.7419 | 0.6129 | 0.4753 | 0.4281 | 0.4293 |
| MQ-5 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| MQ-3 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| MQ-1 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| ELErank | 0.8571 | 0.7500 | 0.9000 | 0.8250 | 0.5161 | 0.7419 | 0.6086 | - | - | - |
| ELErank+ | 0.8571 | 0.7500 | 0.9000 | 0.8250 | 0.5161 | 0.7419 | 0.6086 | - | - | - |
| capstone-1 | 0.7857 | 0.6667 | 0.8421 | 0.7544 | 0.4839 | 0.6774 | 0.5522 | 0.4840 | 0.2924 | 0.3396 |
| MQ-2 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| MQ-4 | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| capstone-2 | 0.7857 | 0.6667 | 0.8421 | 0.7544 | 0.4839 | 0.6774 | 0.5522 | 0.4954 | 0.4265 | 0.4130 |
| capstone-3 | 0.7857 | 0.6667 | 0.8421 | 0.7544 | 0.4839 | 0.6774 | 0.5522 | 0.4840 | 0.2924 | 0.3396 |
| capstone-4 | 0.7857 | 0.6667 | 0.8421 | 0.7544 | 0.2581 | 0.5161 | 0.3613 | 0.4954 | 0.4265 | 0.4130 |
| capstone-5 | 0.7857 | 0.6667 | 0.8421 | 0.7544 | 0.4839 | 0.6774 | 0.5522 | 0.4415 | 0.4335 | 0.3989 |
| AUEB-System1 | 0.2857 | 0.4444 | - | 0.2222 | 0.3226 | 0.4516 | 0.3763 | 0.0257 | 0.0212 | 0.0206 |
| Deep ML methods for | 0.3571 | 0.4000 | 0.3077 | 0.3538 | 0.2581 | 0.2581 | 0.2581 | 0.2847 | 0.1129 | 0.1442 |
| MindLab QA System | 0.3571 | 0.4000 | 0.3077 | 0.3538 | 0.2581 | 0.2581 | 0.2581 | 0.2847 | 0.1129 | 0.1442 |
| DICE1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.5484 | 0.5484 | 0.5484 | 0.5697 | 0.4855 | 0.5068 |
| MindLab QA System ++ | 0.3571 | 0.4000 | 0.3077 | 0.3538 | 0.2581 | 0.2581 | 0.2581 | 0.2847 | 0.1129 | 0.1442 |
| Fleming-4 | 0.5714 | 0.4000 | 0.6667 | 0.5333 | - | - | - | - | - | - |
| OWLMan-phaseB-TaskV1 | 0.4286 | 0.5000 | 0.3333 | 0.4167 | 0.4194 | 0.4839 | 0.4435 | 0.2500 | 0.3044 | 0.2533 |
| DMIS-KU-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.6452 | 0.8710 | 0.7323 | 0.7221 | 0.7874 | 0.7399 |
| DMIS-KU-2 | 0.9286 | 0.8889 | 0.9474 | 0.9181 | 0.6452 | 0.8387 | 0.7108 | 0.7396 | 0.6395 | 0.6693 |
| DMIS-KU-3 | 0.9286 | 0.8571 | 0.9524 | 0.9048 | 0.5806 | 0.8387 | 0.6882 | 0.7460 | 0.6256 | 0.6639 |
| DMIS-KU-4 | 0.8571 | 0.7500 | 0.9000 | 0.8250 | 0.5484 | 0.8387 | 0.6570 | 0.7780 | 0.6044 | 0.6578 |
| DMIS-KU-5 | 0.9286 | 0.8889 | 0.9474 | 0.9181 | 0.5484 | 0.8065 | 0.6473 | 0.7294 | 0.6395 | 0.6607 |
| IISR-3 | 0.2857 | 0.4444 | - | 0.2222 | 0.4194 | 0.4194 | 0.4194 | 0.2500 | 0.1644 | 0.1827 |
| AUEB-System2 | 0.2857 | 0.4444 | - | 0.2222 | 0.2903 | 0.4194 | 0.3387 | 0.0396 | 0.0424 | 0.0395 |
| IISR-4 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4516 | 0.4516 | 0.4516 | 0.6944 | 0.5635 | 0.5895 |
| DICE2 | 0.9286 | 0.8889 | 0.9474 | 0.9181 | 0.2581 | 0.2581 | 0.2581 | - | - | - |
| abstractive | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| DICE_Lab | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4839 | 0.4839 | 0.4839 | 0.5163 | 0.4870 | 0.4839 |
| extractive | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| kmeans | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| similarity measures | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| simple truncation | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| DICE_Lab2 | 0.9286 | 0.8889 | 0.9474 | 0.9181 | 0.5484 | 0.5484 | 0.5484 | 0.3917 | 0.3613 | 0.3648 |
| IISR-5 | 0.9286 | 0.8889 | 0.9474 | 0.9181 | 0.4839 | 0.5161 | 0.4946 | 0.7412 | 0.6764 | 0.6821 |
| BioASQ_Baseline | 0.6429 | - | 0.7826 | 0.3913 | 0.1290 | 0.2581 | 0.1720 | 0.1970 | 0.3275 | 0.2256 |
| BioASQ Baseline ZS | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
| BioASQ Baseline FS | 0.2857 | 0.4444 | - | 0.2222 | - | - | - | - | - | - |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| UR-IW-5 | 0.3240 | 0.1442 | 0.3458 | 0.1519 | 4.71 | 4.38 | 2.96 | 4.46 |
| UR-IW-4 | 0.3440 | 0.1506 | 0.3700 | 0.1571 | 4.68 | 4.54 | 3.08 | 4.41 |
| UR-IW-3 | 0.4811 | 0.2349 | 0.4812 | 0.2266 | 4.66 | 4.91 | 3.22 | 4.37 |
| UR-IW-2 | 0.5006 | 0.2658 | 0.4866 | 0.2548 | 4.72 | 4.91 | 3.36 | 4.53 |
| IISR-2 | 0.2840 | 0.2882 | 0.2792 | 0.2774 | 4.91 | 4.56 | 4.90 | 4.99 |
| IISR-1 | 0.2929 | 0.3096 | 0.2873 | 0.2993 | 4.91 | 4.52 | 4.90 | 4.99 |
| bioinfo-0 | 0.1760 | 0.1667 | 0.1828 | 0.1637 | 4.67 | 3.93 | 4.37 | 4.91 |
| bioinfo-1 | 0.2310 | 0.2247 | 0.2244 | 0.2138 | 4.73 | 4.31 | 4.73 | 4.98 |
| bioinfo-2 | 0.1149 | 0.1060 | 0.1267 | 0.1088 | 3.88 | 3.58 | 3.59 | 4.39 |
| bioinfo-3 | 0.1864 | 0.1654 | 0.1901 | 0.1628 | 4.83 | 4.02 | 4.30 | 4.89 |
| bioinfo-4 | 0.1057 | 0.0963 | 0.1177 | 0.1009 | 4.02 | 3.30 | 3.31 | 4.38 |
| dmiip1 | 0.4294 | 0.3325 | 0.4223 | 0.3207 | 4.30 | 4.42 | 4.13 | 4.30 |
| dmiip2 | 0.4294 | 0.3325 | 0.4223 | 0.3207 | 4.30 | 4.42 | 4.13 | 4.30 |
| dmiip3 | 0.4284 | 0.3310 | 0.4212 | 0.3192 | 4.30 | 4.43 | 4.13 | 4.30 |
| dmiip4 | 0.4387 | 0.3299 | 0.4295 | 0.3171 | 4.36 | 4.46 | 4.14 | 4.22 |
| dmiip5 | 0.2987 | 0.3075 | 0.2938 | 0.2971 | 4.29 | 4.36 | 4.79 | 4.88 |
| MQ-5 | 0.4375 | 0.3331 | 0.4281 | 0.3207 | 4.36 | 4.47 | 4.12 | 4.26 |
| MQ-3 | 0.4406 | 0.3714 | 0.4229 | 0.3523 | 4.76 | 4.72 | 4.58 | 4.80 |
| MQ-1 | 0.3749 | 0.3670 | 0.3559 | 0.3441 | 4.84 | 4.50 | 4.77 | 4.92 |
| ELErank | 0.2347 | 0.2557 | 0.2211 | 0.2400 | 4.64 | 3.53 | 4.20 | 4.63 |
| ELErank+ | 0.2347 | 0.2557 | 0.2211 | 0.2400 | 4.64 | 3.53 | 4.20 | 4.63 |
| capstone-1 | 0.4373 | 0.3339 | 0.4285 | 0.3216 | 4.33 | 4.48 | 4.12 | 4.27 |
| MQ-2 | 0.3370 | 0.3463 | 0.3251 | 0.3292 | 4.50 | 4.16 | 4.48 | 4.69 |
| MQ-4 | 0.4373 | 0.3339 | 0.4285 | 0.3216 | 4.33 | 4.48 | 4.12 | 4.27 |
| capstone-2 | 0.4373 | 0.3339 | 0.4285 | 0.3216 | 4.33 | 4.48 | 4.12 | 4.27 |
| capstone-3 | 0.5012 | 0.3275 | 0.5015 | 0.3170 | 3.86 | 4.46 | 3.86 | 3.53 |
| capstone-4 | 0.5012 | 0.3275 | 0.5015 | 0.3170 | 3.86 | 4.46 | 3.86 | 3.53 |
| capstone-5 | 0.4373 | 0.3339 | 0.4285 | 0.3216 | 4.33 | 4.48 | 4.12 | 4.27 |
| AUEB-System1 | - | - | - | - | - | - | - | - |
| Deep ML methods for | 0.0351 | 0.0546 | 0.0302 | 0.0495 | 2.63 | 2.51 | 3.10 | 4.01 |
| MindLab QA System | 0.0582 | 0.0967 | 0.0401 | 0.0697 | 2.61 | 2.78 | 3.50 | 3.89 |
| DICE1 | - | - | - | - | - | - | - | - |
| MindLab QA System ++ | 0.0412 | 0.0662 | 0.0328 | 0.0549 | 2.63 | 2.50 | 3.16 | 4.00 |
| Fleming-4 | - | - | - | - | 0.40 | 0.36 | 0.56 | 0.64 |
| OWLMan-phaseB-TaskV1 | - | - | - | - | 1.00 | 1.00 | 1.00 | 1.00 |
| DMIS-KU-1 | - | - | - | - | - | - | - | - |
| DMIS-KU-2 | - | - | - | - | - | - | - | - |
| DMIS-KU-3 | - | - | - | - | - | - | - | - |
| DMIS-KU-4 | - | - | - | - | - | - | - | - |
| DMIS-KU-5 | - | - | - | - | - | - | - | - |
| IISR-3 | 0.2721 | 0.2961 | 0.2591 | 0.2815 | 4.62 | 4.01 | 4.51 | 4.91 |
| AUEB-System2 | - | - | - | - | - | - | - | - |
| IISR-4 | 0.2889 | 0.3031 | 0.2792 | 0.2893 | 4.89 | 4.47 | 4.90 | 4.98 |
| DICE2 | - | - | - | - | - | - | - | - |
| abstractive | 0.0940 | 0.1032 | 0.0920 | 0.1007 | 1.07 | 1.02 | 1.13 | 1.16 |
| DICE_Lab | - | - | - | - | - | - | - | - |
| extractive | 0.0946 | 0.1025 | 0.0922 | 0.0997 | 1.08 | 1.03 | 1.13 | 1.14 |
| kmeans | 0.1088 | 0.1106 | 0.1066 | 0.1081 | 1.11 | 1.02 | 1.14 | 1.17 |
| similarity measures | 0.0834 | 0.0944 | 0.0809 | 0.0916 | 1.12 | 1.00 | 1.13 | 1.17 |
| simple truncation | 0.0952 | 0.1051 | 0.0935 | 0.1029 | 1.11 | 0.97 | 1.12 | 1.17 |
| DICE_Lab2 | - | - | - | - | - | - | - | - |
| IISR-5 | 0.2827 | 0.2917 | 0.2747 | 0.2785 | 4.91 | 4.57 | 4.89 | 4.99 |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |
| BioASQ Baseline ZS | 0.1552 | 0.1045 | 0.1745 | 0.1104 | 2.44 | 2.09 | 1.93 | 3.19 |
| BioASQ Baseline FS | 0.2277 | 0.2236 | 0.2226 | 0.2150 | 3.29 | 3.31 | 3.76 | 4.23 |
Test batch 5
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| mibi_rag_abstract | 0.8400 | 0.8667 | 0.8000 | 0.8333 | 0.0476 | 0.0476 | 0.0476 | 0.5048 | 0.3804 | 0.4147 |
| mibi_rag_snippet | 0.9200 | 0.9333 | 0.9000 | 0.9167 | - | - | - | 0.5286 | 0.4107 | 0.4441 |
| UR-IW-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2857 | 0.3810 | 0.3254 | 0.4840 | 0.4173 | 0.4266 |
| UR-IW-4 | 0.9600 | 0.9655 | 0.9524 | 0.9589 | 0.2381 | 0.2381 | 0.2381 | 0.4563 | 0.3478 | 0.3778 |
| UR-IW-2 | 0.8800 | 0.8889 | 0.8696 | 0.8792 | 0.2381 | 0.2857 | 0.2619 | 0.5255 | 0.4510 | 0.4764 |
| UR-IW-5 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2381 | 0.2857 | 0.2540 | 0.6054 | 0.5158 | 0.5404 |
| UR-IW-3 | 0.9600 | 0.9655 | 0.9524 | 0.9589 | 0.2381 | 0.2381 | 0.2381 | 0.6010 | 0.5158 | 0.5337 |
| Gatech competition | 0.9200 | 0.9286 | 0.9091 | 0.9188 | 0.2381 | 0.2381 | 0.2381 | 0.4939 | 0.3433 | 0.3587 |
| Mistral-7B finetune | 0.9600 | 0.9655 | 0.9524 | 0.9589 | 0.2381 | 0.2381 | 0.2381 | 0.5786 | 0.5006 | 0.5265 |
| Synthia with first | 0.9200 | 0.9286 | 0.9091 | 0.9188 | 0.2857 | 0.2857 | 0.2857 | 0.4627 | 0.4107 | 0.4020 |
| LLM4SciLit | 0.4000 | - | 0.5714 | 0.2857 | - | - | - | - | - | - |
| RMC_append_snippets | 0.9600 | 0.9655 | 0.9524 | 0.9589 | 0.1905 | 0.1905 | 0.1905 | 0.4770 | 0.4400 | 0.4365 |
| bioinfo-0 | 0.6000 | 0.7500 | - | 0.3750 | - | - | - | - | - | - |
| bioinfo-1 | 0.6000 | 0.7500 | - | 0.3750 | - | - | - | - | - | - |
| bioinfo-2 | 0.6000 | 0.7500 | - | 0.3750 | - | - | - | - | - | - |
| bioinfo-3 | 0.0400 | 0.0769 | - | 0.0385 | 0.1905 | 0.1905 | 0.1905 | 0.3214 | 0.2797 | 0.2950 |
| bioinfo-4 | - | - | - | - | 0.2857 | 0.2857 | 0.2857 | 0.4169 | 0.3464 | 0.3698 |
| Fleming-1 | 0.8400 | 0.8750 | 0.7778 | 0.8264 | 0.0476 | 0.0952 | 0.0714 | 0.5196 | 0.4190 | 0.4420 |
| dmiip2024_2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2381 | 0.2381 | 0.2381 | 0.3943 | 0.5380 | 0.4249 |
| dmiip2024_3 | 0.9600 | 0.9677 | 0.9474 | 0.9576 | 0.3333 | 0.3810 | 0.3571 | 0.5942 | 0.4725 | 0.5068 |
| dmiip2024_4 | 0.4000 | - | 0.5714 | 0.2857 | 0.3333 | 0.5238 | 0.4206 | 0.4481 | 0.4682 | 0.4386 |
| dmiip2024_1 | 0.9200 | 0.9286 | 0.9091 | 0.9188 | 0.3333 | 0.4286 | 0.3810 | 0.6647 | 0.5011 | 0.5453 |
| dmiip2024 | 0.9200 | 0.9286 | 0.9091 | 0.9188 | 0.3333 | 0.4286 | 0.3810 | 0.6603 | 0.4967 | 0.5407 |
| IISR 5th submit | 0.9600 | 0.9655 | 0.9524 | 0.9589 | 0.2857 | 0.2857 | 0.2857 | 0.5466 | 0.4701 | 0.4915 |
| RAG for medicine | 0.7200 | 0.7200 | 0.7200 | 0.7200 | 0.1905 | 0.3333 | 0.2397 | 0.5524 | 0.5209 | 0.5054 |
| IBE-LM ver1 | 0.6000 | 0.7500 | - | 0.3750 | 0.3333 | 0.4762 | 0.3825 | - | - | - |
| IBE-LM ver3 | 0.6000 | 0.7500 | - | 0.3750 | 0.4286 | 0.4762 | 0.4444 | - | - | - |
| IBE-LM ver 5 | 0.6000 | 0.7500 | - | 0.3750 | 0.4286 | 0.4762 | 0.4444 | - | - | - |
| IBE-LM ver2 | 0.6000 | 0.7500 | - | 0.3750 | 0.3333 | 0.4762 | 0.3849 | 0.1143 | 0.1706 | 0.1280 |
| IBE-LM ver4 | 0.6000 | 0.7500 | - | 0.3750 | 0.3333 | 0.4762 | 0.3849 | 0.1143 | 0.1706 | 0.1280 |
| IISR 2nd submit | 0.9600 | 0.9655 | 0.9524 | 0.9589 | 0.2857 | 0.2857 | 0.2857 | 0.5813 | 0.4591 | 0.4931 |
| IISR 3rd submit | 0.9200 | 0.9333 | 0.9000 | 0.9167 | 0.2381 | 0.2381 | 0.2381 | 0.5461 | 0.4721 | 0.4960 |
| IISR 4th submit | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1905 | 0.1905 | 0.1905 | 0.5449 | 0.4682 | 0.4934 |
| IISR first submit | 0.9200 | 0.9286 | 0.9091 | 0.9188 | 0.2381 | 0.2857 | 0.2540 | 0.6317 | 0.4685 | 0.5161 |
| CPS | 0.7200 | 0.7742 | 0.6316 | 0.7029 | 0.2857 | 0.2857 | 0.2857 | 0.3532 | 0.2286 | 0.2579 |
| lasige-ku | 0.6800 | 0.7895 | 0.3333 | 0.5614 | - | - | - | - | - | - |
| extractive | 0.8400 | 0.8824 | 0.7500 | 0.8162 | 0.1429 | 0.1905 | 0.1667 | 0.1996 | 0.2011 | 0.1908 |
| AUEB-System1 | 0.8000 | 0.8571 | 0.6667 | 0.7619 | 0.3333 | 0.4286 | 0.3690 | 0.4286 | 0.2988 | 0.3211 |
| BioASQ_Baseline | 0.4400 | 0.3000 | 0.5333 | 0.4167 | 0.0476 | 0.1905 | 0.0968 | 0.2366 | 0.2599 | 0.2100 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| mibi_rag_abstract | 0.3732 | 0.2670 | 0.3756 | 0.2594 | 4.44 | 4.79 | 4.27 | 4.42 |
| mibi_rag_snippet | 0.4135 | 0.3016 | 0.4068 | 0.2882 | 4.54 | 4.75 | 4.41 | 4.49 |
| UR-IW-1 | 0.3564 | 0.2745 | 0.3508 | 0.2623 | 4.41 | 4.60 | 4.26 | 4.36 |
| UR-IW-4 | 0.4141 | 0.3122 | 0.4068 | 0.2960 | - | - | - | - |
| UR-IW-2 | 0.3327 | 0.3651 | 0.3220 | 0.3535 | - | - | - | - |
| UR-IW-5 | 0.3998 | 0.2539 | 0.4022 | 0.2458 | - | - | - | - |
| UR-IW-3 | 0.3085 | 0.3244 | 0.3029 | 0.3167 | - | - | - | - |
| Gatech competition | 0.2624 | 0.2479 | 0.2580 | 0.2399 | 4.35 | 4.44 | 4.20 | 4.62 |
| Mistral-7B finetune | 0.3102 | 0.3236 | 0.3052 | 0.3149 | 4.35 | 4.35 | 4.22 | 4.47 |
| Synthia with first | 0.3065 | 0.2778 | 0.3022 | 0.2683 | 4.49 | 4.48 | 4.33 | 4.66 |
| LLM4SciLit | 0.0571 | 0.0776 | 0.0502 | 0.0695 | 3.07 | 3.08 | 3.41 | 3.91 |
| RMC_append_snippets | 0.3978 | 0.3360 | 0.3901 | 0.3215 | 4.51 | 4.66 | 4.42 | 4.59 |
| bioinfo-0 | 0.4060 | 0.1418 | 0.4170 | 0.1384 | 4.14 | 4.69 | 3.75 | 4.14 |
| bioinfo-1 | 0.4038 | 0.1309 | 0.4188 | 0.1292 | 3.99 | 4.65 | 3.71 | 3.94 |
| bioinfo-2 | 0.4636 | 0.1614 | 0.4761 | 0.1566 | 4.00 | 4.76 | 3.72 | 4.01 |
| bioinfo-3 | 0.2991 | 0.1401 | 0.3104 | 0.1400 | 4.19 | 4.42 | 3.78 | 4.28 |
| bioinfo-4 | 0.4208 | 0.1507 | 0.4336 | 0.1480 | 4.08 | 4.64 | 3.74 | 3.99 |
| Fleming-1 | 0.3241 | 0.1686 | 0.3458 | 0.1681 | 4.28 | 4.68 | 3.98 | 4.34 |
| dmiip2024_2 | 0.3139 | 0.3123 | 0.3134 | 0.3004 | 4.38 | 4.60 | 4.38 | 4.53 |
| dmiip2024_3 | 0.2426 | 0.2628 | 0.2393 | 0.2531 | 4.48 | 4.55 | 4.40 | 4.65 |
| dmiip2024_4 | 0.2834 | 0.3004 | 0.2700 | 0.2849 | 4.18 | 4.21 | 4.18 | 4.38 |
| dmiip2024_1 | 0.2909 | 0.3237 | 0.2854 | 0.3134 | 4.39 | 4.41 | 4.38 | 4.46 |
| dmiip2024 | 0.2879 | 0.3150 | 0.2787 | 0.3000 | 4.36 | 4.35 | 4.25 | 4.36 |
| IISR 5th submit | 0.4235 | 0.2063 | 0.4238 | 0.1990 | 4.26 | 4.71 | 3.92 | 4.34 |
| RAG for medicine | 0.3676 | 0.2008 | 0.3722 | 0.1955 | 4.27 | 4.75 | 4.06 | 4.29 |
| IBE-LM ver1 | - | - | - | - | 0.89 | 0.75 | 0.98 | 1.16 |
| IBE-LM ver3 | - | - | - | - | 0.89 | 0.75 | 0.98 | 1.16 |
| IBE-LM ver 5 | - | - | - | - | 0.89 | 0.75 | 0.98 | 1.16 |
| IBE-LM ver2 | - | - | - | - | 0.89 | 0.75 | 0.98 | 1.16 |
| IBE-LM ver4 | - | - | - | - | 0.89 | 0.75 | 0.98 | 1.16 |
| IISR 2nd submit | 0.3274 | 0.2889 | 0.3238 | 0.2781 | 4.56 | 4.68 | 4.56 | 4.59 |
| IISR 3rd submit | 0.4361 | 0.2311 | 0.4350 | 0.2207 | 4.34 | 4.71 | 4.00 | 4.36 |
| IISR 4th submit | 0.4259 | 0.1991 | 0.4210 | 0.1898 | 4.27 | 4.75 | 3.93 | 4.25 |
| IISR first submit | 0.3315 | 0.3111 | 0.3256 | 0.2997 | 4.48 | 4.56 | 4.42 | 4.54 |
| CPS | 0.3234 | 0.2860 | 0.3173 | 0.2773 | 4.38 | 4.08 | 3.98 | 4.53 |
| lasige-ku | 0.0532 | 0.0469 | 0.0788 | 0.0640 | 3.00 | 2.58 | 2.51 | 3.98 |
| extractive | 0.2447 | 0.2549 | 0.2447 | 0.2548 | 4.13 | 3.87 | 3.91 | 4.31 |
| AUEB-System1 | - | - | - | - | - | - | - | - |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |