BioASQ Participants Area
Task 8b: Test Results of Phase B
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.The evaluation measures that are used in Task B are presented here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| pa-base | 0.6800 | 0.8095 | - | 0.4048 | 0.2188 | 0.5000 | 0.3542 | 0.2750 | 0.2250 | 0.2305 |
| pa | 0.6800 | 0.8095 | - | 0.4048 | 0.2188 | 0.5000 | 0.3542 | 0.3884 | 0.5629 | 0.4315 |
| AUEB-System1 | 0.6800 | 0.8095 | - | 0.4048 | 0.0313 | 0.0625 | 0.0469 | 0.0333 | 0.0417 | 0.0367 |
| AUEB-System3 | 0.6800 | 0.8095 | - | 0.4048 | 0.0000 | 0.0313 | 0.0078 | 0.0200 | 0.0333 | 0.0250 |
| AUEB-System4 | 0.6800 | 0.8095 | - | 0.4048 | 0.0313 | 0.0625 | 0.0469 | 0.0500 | 0.0750 | 0.0597 |
| AUEB-System5 | 0.6800 | 0.8095 | - | 0.4048 | 0.0313 | 0.1563 | 0.0714 | 0.1008 | 0.1308 | 0.1095 |
| UoT_baseline | 0.6800 | 0.8095 | - | 0.4048 | 0.3125 | 0.5938 | 0.4266 | 0.3461 | 0.2933 | 0.2918 |
| UoT_allquestions | 0.6800 | 0.8095 | - | 0.4048 | 0.2813 | 0.5938 | 0.4099 | 0.3842 | 0.3200 | 0.3262 |
| BJUTNLPGroup | 0.6800 | 0.8095 | - | 0.4048 | 0.2813 | 0.4063 | 0.3307 | 0.1800 | 0.5458 | 0.2652 |
| MQ-5 | 0.6800 | 0.8095 | - | 0.4048 | - | - | - | - | - | - |
| MQ-1 | 0.6800 | 0.8095 | - | 0.4048 | - | - | - | - | - | - |
| MQ-2 | 0.6800 | 0.8095 | - | 0.4048 | - | - | - | - | - | - |
| MQ-3 | 0.6800 | 0.8095 | - | 0.4048 | - | - | - | - | - | - |
| bio-answerfinder | 0.6800 | 0.8095 | - | 0.4048 | 0.0625 | 0.2188 | 0.1406 | 0.1511 | 0.3671 | 0.1673 |
| Umass_czi_1 | 0.6800 | 0.7778 | 0.4286 | 0.6032 | 0.3750 | 0.5938 | 0.4688 | - | - | - |
| MQ-4 | 0.6800 | 0.8095 | - | 0.4048 | - | - | - | - | - | - |
| Umass_czi_2 | 0.5600 | 0.6857 | 0.2667 | 0.4762 | 0.2500 | 0.3438 | 0.2891 | 0.4875 | 0.2983 | 0.3448 |
| Umass_czi_3 | 0.6800 | 0.7778 | 0.4286 | 0.6032 | 0.2500 | 0.3438 | 0.2891 | 0.4875 | 0.2983 | 0.3448 |
| Umass_czi_4 | 0.6400 | 0.7273 | 0.4706 | 0.5989 | 0.2188 | 0.4375 | 0.3005 | 0.4875 | 0.2983 | 0.3448 |
| NCU-IISR_1 | 0.7600 | 0.8235 | 0.6250 | 0.7243 | - | - | - | - | - | - |
| Umass_czi_5 | 0.6400 | 0.7429 | 0.4000 | 0.5714 | 0.2188 | 0.4375 | 0.3005 | 0.1350 | 0.0767 | 0.0967 |
| FudanLabZhu1 | 0.6000 | 0.7368 | 0.1667 | 0.4518 | 0.3750 | 0.5938 | 0.4557 | 0.4250 | 0.3067 | 0.3408 |
| FudanLabZhu4 | 0.6000 | 0.7368 | 0.1667 | 0.4518 | 0.3125 | 0.5938 | 0.4219 | 0.4250 | 0.3067 | 0.3408 |
| auth-qa-1 | 0.6800 | 0.8095 | - | 0.4048 | 0.0625 | 0.1563 | 0.1094 | 0.0325 | 0.0583 | 0.0417 |
| kmeans | - | - | - | - | - | - | - | - | - | - |
| KoreaUniv-DMIS-2 | 0.8400 | 0.8824 | 0.7500 | 0.8162 | 0.3438 | 0.6250 | 0.4583 | 0.4367 | 0.3100 | 0.3333 |
| KoreaUniv-DMIS-3 | 0.8800 | 0.9143 | 0.8000 | 0.8571 | 0.2500 | 0.6250 | 0.3979 | 0.3908 | 0.3100 | 0.3152 |
| simple truncation | - | - | - | - | - | - | - | - | - | - |
| KoreaUniv-DMIS-1 | 0.7600 | 0.8235 | 0.6250 | 0.7243 | 0.1250 | 0.4375 | 0.2615 | 0.4333 | 0.3425 | 0.3597 |
| KoreaUniv-DMIS-4 | 0.7600 | 0.8235 | 0.6250 | 0.7243 | 0.3125 | 0.6250 | 0.4344 | 0.4450 | 0.3325 | 0.3516 |
| KoreaUniv-DMIS-5 | 0.8800 | 0.9091 | 0.8235 | 0.8663 | 0.3438 | 0.5938 | 0.4438 | 0.4733 | 0.3633 | 0.3718 |
| BioASQ_Baseline | 0.4000 | 0.2857 | 0.4828 | 0.3842 | 0.1563 | 0.2813 | 0.2016 | 0.1322 | 0.3483 | 0.1839 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| pa-base | 0.1040 | 0.1061 | 0.1036 | 0.1058 | 0.94 | 1.04 | 1.07 | 1.10 |
| pa | 0.1040 | 0.1061 | 0.1036 | 0.1058 | 0.94 | 1.04 | 1.07 | 1.10 |
| AUEB-System1 | 0.0944 | 0.0495 | 0.0978 | 0.0496 | 0.64 | 1.00 | 0.66 | 0.84 |
| AUEB-System3 | 0.0563 | 0.0278 | 0.0636 | 0.0312 | 0.62 | 0.78 | 0.65 | 0.94 |
| AUEB-System4 | 0.0163 | 0.0056 | 0.0246 | 0.0099 | 0.52 | 0.45 | 0.35 | 0.83 |
| AUEB-System5 | 0.0595 | 0.0282 | 0.0659 | 0.0317 | 0.57 | 0.77 | 0.58 | 0.84 |
| UoT_baseline | - | - | - | - | - | - | - | - |
| UoT_allquestions | - | - | - | - | - | - | - | - |
| BJUTNLPGroup | 0.0794 | 0.0976 | 0.0633 | 0.0731 | 2.46 | 2.51 | 3.08 | 3.68 |
| MQ-5 | 0.4616 | 0.3331 | 0.4740 | 0.3295 | 3.85 | 4.23 | 3.88 | 4.20 |
| MQ-1 | 0.5322 | 0.3339 | 0.5412 | 0.3276 | 3.63 | 4.37 | 3.83 | 3.95 |
| MQ-2 | 0.5671 | 0.3487 | 0.5732 | 0.3417 | 3.62 | 4.33 | 3.78 | 3.94 |
| MQ-3 | 0.5789 | 0.3568 | 0.5834 | 0.3481 | 3.63 | 4.33 | 3.83 | 3.92 |
| bio-answerfinder | 0.4515 | 0.3289 | 0.4543 | 0.3236 | 3.74 | 3.88 | 3.69 | 4.24 |
| Umass_czi_1 | - | - | - | - | - | - | - | - |
| MQ-4 | 0.5401 | 0.3365 | 0.5492 | 0.3301 | 3.62 | 4.33 | 3.81 | 3.90 |
| Umass_czi_2 | - | - | - | - | - | - | - | - |
| Umass_czi_3 | - | - | - | - | - | - | - | - |
| Umass_czi_4 | - | - | - | - | - | - | - | - |
| NCU-IISR_1 | 0.1699 | 0.1923 | 0.1735 | 0.1935 | 3.52 | 3.26 | 3.59 | 4.46 |
| Umass_czi_5 | - | - | - | - | - | - | - | - |
| FudanLabZhu1 | - | - | - | - | - | - | - | - |
| FudanLabZhu4 | - | - | - | - | - | - | - | - |
| auth-qa-1 | - | - | - | - | - | - | - | - |
| kmeans | 0.5566 | 0.3208 | 0.5609 | 0.3129 | 3.33 | 4.35 | 3.70 | 3.92 |
| KoreaUniv-DMIS-2 | - | - | - | - | - | - | - | - |
| KoreaUniv-DMIS-3 | - | - | - | - | - | - | - | - |
| simple truncation | 0.3841 | 0.3385 | 0.3896 | 0.3339 | 3.93 | 4.07 | 4.13 | 4.62 |
| KoreaUniv-DMIS-1 | 0.2326 | 0.1576 | 0.2509 | 0.1651 | 3.79 | 3.50 | 3.49 | 4.37 |
| KoreaUniv-DMIS-4 | 0.1957 | 0.1887 | 0.2021 | 0.1898 | 4.09 | 3.40 | 3.60 | 4.54 |
| KoreaUniv-DMIS-5 | 0.1960 | 0.1895 | 0.2019 | 0.1902 | 4.07 | 3.37 | 3.61 | 4.53 |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |
Test batch 2
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| BJUTNLPGroup | 0.7500 | 0.8571 | - | 0.4286 | 0.0800 | 0.3200 | 0.1747 | 0.1429 | 0.4619 | 0.2027 |
| AUEB-System1 | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | 0.0179 | 0.0179 | 0.0179 |
| AUEB-System2 | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | 0.0143 | 0.0238 | 0.0179 |
| AUEB-System3 | 0.7500 | 0.8571 | - | 0.4286 | 0.0000 | 0.0400 | 0.0100 | - | - | - |
| AUEB-System4 | 0.7500 | 0.8571 | - | 0.4286 | 0.0000 | 0.0800 | 0.0280 | 0.0429 | 0.0524 | 0.0429 |
| AUEB-System5 | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | 0.0667 | 0.0821 | 0.0672 |
| auth-qa-1 | 0.7778 | 0.8621 | 0.4286 | 0.6453 | 0.0000 | 0.0800 | 0.0233 | 0.0845 | 0.1238 | 0.0944 |
| bio-answerfinder | 0.7778 | 0.8621 | 0.4286 | 0.6453 | 0.2000 | 0.2400 | 0.2080 | 0.3693 | 0.4714 | 0.3803 |
| UoT_baseline | 0.7500 | 0.8571 | - | 0.4286 | 0.1600 | 0.4400 | 0.2580 | 0.5361 | 0.4476 | 0.4306 |
| Best yesno | 0.7500 | 0.8571 | - | 0.4286 | 0.1600 | 0.4400 | 0.2580 | 0.5361 | 0.4476 | 0.4306 |
| UoT_multitask_learn | 0.8333 | 0.9000 | 0.5000 | 0.7000 | 0.2000 | 0.4000 | 0.2800 | 0.4643 | 0.4214 | 0.4108 |
| UoT_allquestions | 0.7778 | 0.8710 | 0.2000 | 0.5355 | 0.1600 | 0.4800 | 0.2540 | 0.4107 | 0.3738 | 0.3712 |
| MQ-1 | - | - | - | - | - | - | - | - | - | - |
| Best factoid | 0.7500 | 0.8571 | - | 0.4286 | 0.1200 | 0.4400 | 0.2413 | 0.4827 | 0.4071 | 0.3950 |
| Umass_czi_2 | 0.7778 | 0.8519 | 0.5556 | 0.7037 | 0.0800 | 0.2400 | 0.1333 | - | - | - |
| Umass_czi_1 | 0.7222 | 0.8148 | 0.4444 | 0.6296 | 0.1200 | 0.2000 | 0.1480 | - | - | - |
| NCU-IISR_1 | 0.7778 | 0.8519 | 0.5556 | 0.7037 | 0.2800 | 0.4400 | 0.3293 | 0.3214 | 0.2381 | 0.2667 |
| MQ-2 | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | - | - | - |
| MQ-3 | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | - | - | - |
| MQ-4 | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | - | - | - |
| simple truncation | - | - | - | - | - | - | - | - | - | - |
| MQ-5 | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | - | - | - |
| Multitask SBERT Cls | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | - | - | - |
| Multitask SBERT reg | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | - | - | - |
| sbert cls | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | - | - | - |
| sbert reg | 0.7500 | 0.8571 | - | 0.4286 | - | - | - | - | - | - |
| pa-base | 0.2500 | - | 0.4000 | 0.2000 | 0.1200 | 0.4000 | 0.2300 | 0.1012 | 0.0619 | 0.0702 |
| FudanLabZhu1 | 0.6944 | 0.7843 | 0.4762 | 0.6303 | 0.2800 | 0.4000 | 0.3200 | 0.3631 | 0.2976 | 0.3112 |
| FudanLabZhu3 | 0.6944 | 0.7843 | 0.4762 | 0.6303 | 0.2000 | 0.3200 | 0.2413 | 0.5417 | 0.5024 | 0.4678 |
| kmeans | - | - | - | - | - | - | - | - | - | - |
| FudanLabZhu4 | 0.6944 | 0.7843 | 0.4762 | 0.6303 | 0.2400 | 0.3600 | 0.2900 | 0.5417 | 0.5024 | 0.4678 |
| KoreaUniv-DMIS-1 | 0.9444 | 0.9630 | 0.8889 | 0.9259 | 0.1200 | 0.3200 | 0.1967 | 0.5643 | 0.4643 | 0.4735 |
| KoreaUniv-DMIS-2 | 0.9167 | 0.9434 | 0.8421 | 0.8928 | 0.1600 | 0.3600 | 0.2367 | 0.5060 | 0.3667 | 0.4029 |
| KoreaUniv-DMIS-5 | 0.8889 | 0.9259 | 0.7778 | 0.8519 | 0.1600 | 0.3600 | 0.2193 | 0.4821 | 0.3667 | 0.3985 |
| KoreaUniv-DMIS-3 | 0.9444 | 0.9630 | 0.8889 | 0.9259 | 0.1200 | 0.3200 | 0.2000 | 0.5500 | 0.4214 | 0.4544 |
| KoreaUniv-DMIS-4 | 0.9167 | 0.9434 | 0.8421 | 0.8928 | 0.2800 | 0.4400 | 0.3533 | 0.4881 | 0.3381 | 0.3798 |
| pa | 0.2500 | - | 0.4000 | 0.2000 | 0.1200 | 0.4400 | 0.2360 | 0.3418 | 0.4881 | 0.3441 |
| BioASQ_Baseline | 0.3611 | 0.3784 | 0.3429 | 0.3606 | 0.0800 | 0.1200 | 0.1000 | 0.2196 | 0.5929 | 0.2858 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| BJUTNLPGroup | 0.0325 | 0.0555 | 0.0223 | 0.0389 | 3.07 | 2.37 | 3.32 | 4.53 |
| AUEB-System1 | 0.0551 | 0.0331 | 0.0590 | 0.0338 | 0.75 | 0.98 | 0.76 | 0.82 |
| AUEB-System2 | 0.0491 | 0.0289 | 0.0550 | 0.0309 | 0.72 | 0.99 | 0.75 | 0.84 |
| AUEB-System3 | 0.0551 | 0.0331 | 0.0590 | 0.0338 | 0.75 | 0.98 | 0.76 | 0.82 |
| AUEB-System4 | 0.2125 | 0.1151 | 0.2113 | 0.1128 | 0.86 | 1.22 | 0.92 | 0.84 |
| AUEB-System5 | 0.2125 | 0.1151 | 0.2113 | 0.1128 | 0.86 | 1.22 | 0.92 | 0.84 |
| auth-qa-1 | - | - | - | - | - | - | - | - |
| bio-answerfinder | 0.4150 | 0.2997 | 0.4231 | 0.2981 | 3.88 | 4.20 | 3.94 | 4.45 |
| UoT_baseline | - | - | - | - | - | - | - | - |
| Best yesno | - | - | - | - | - | - | - | - |
| UoT_multitask_learn | - | - | - | - | - | - | - | - |
| UoT_allquestions | - | - | - | - | - | - | - | - |
| MQ-1 | 0.5205 | 0.3040 | 0.5243 | 0.2939 | 3.76 | 4.42 | 3.67 | 3.76 |
| Best factoid | - | - | - | - | - | - | - | - |
| Umass_czi_2 | - | - | - | - | - | - | - | - |
| Umass_czi_1 | - | - | - | - | - | - | - | - |
| NCU-IISR_1 | 0.1560 | 0.1841 | 0.1547 | 0.1783 | 3.93 | 3.55 | 4.01 | 4.76 |
| MQ-2 | 0.5244 | 0.3254 | 0.5339 | 0.3206 | 3.71 | 4.49 | 3.78 | 3.92 |
| MQ-3 | 0.5232 | 0.3206 | 0.5314 | 0.3153 | 3.77 | 4.51 | 3.84 | 3.92 |
| MQ-4 | 0.4782 | 0.3012 | 0.4885 | 0.2975 | 3.73 | 4.40 | 3.75 | 3.90 |
| simple truncation | 0.5201 | 0.3011 | 0.5228 | 0.2906 | 3.73 | 4.50 | 3.67 | 3.74 |
| MQ-5 | 0.3794 | 0.2683 | 0.3885 | 0.2651 | 3.88 | 4.31 | 3.84 | 4.15 |
| Multitask SBERT Cls | 0.4701 | 0.2992 | 0.4813 | 0.2947 | 3.72 | 4.56 | 3.80 | 3.98 |
| Multitask SBERT reg | 0.4649 | 0.2938 | 0.4747 | 0.2898 | 3.77 | 4.50 | 3.80 | 4.01 |
| sbert cls | 0.4806 | 0.2993 | 0.4915 | 0.2954 | 3.72 | 4.52 | 3.80 | 3.95 |
| sbert reg | 0.4593 | 0.2904 | 0.4784 | 0.2918 | 3.75 | 4.39 | 3.78 | 4.00 |
| pa-base | 0.0592 | 0.0653 | 0.0590 | 0.0642 | 1.08 | 0.98 | 1.03 | 1.19 |
| FudanLabZhu1 | - | - | - | - | - | - | - | - |
| FudanLabZhu3 | - | - | - | - | - | - | - | - |
| kmeans | 0.5138 | 0.2891 | 0.5281 | 0.2845 | 3.69 | 4.41 | 3.63 | 3.63 |
| FudanLabZhu4 | - | - | - | - | - | - | - | - |
| KoreaUniv-DMIS-1 | 0.2387 | 0.2287 | 0.2486 | 0.2311 | 4.50 | 3.81 | 4.06 | 4.71 |
| KoreaUniv-DMIS-2 | 0.2546 | 0.2442 | 0.2567 | 0.2413 | 4.51 | 4.11 | 4.27 | 4.77 |
| KoreaUniv-DMIS-5 | 0.2402 | 0.2318 | 0.2486 | 0.2335 | 4.52 | 3.78 | 4.03 | 4.71 |
| KoreaUniv-DMIS-3 | 0.2662 | 0.2553 | 0.2687 | 0.2527 | 4.53 | 4.16 | 4.27 | 4.76 |
| KoreaUniv-DMIS-4 | 0.2363 | 0.2272 | 0.2494 | 0.2331 | 4.53 | 3.83 | 4.02 | 4.76 |
| pa | 0.0592 | 0.0653 | 0.0590 | 0.0642 | 1.08 | 0.98 | 1.03 | 1.19 |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |
Test batch 3
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| bio-answerfinder | 0.8710 | 0.8947 | 0.8333 | 0.8640 | 0.3214 | 0.4286 | 0.3494 | 0.3884 | 0.4972 | 0.3983 |
| auth-qa-1 | 0.7742 | 0.8205 | 0.6957 | 0.7581 | 0.2143 | 0.2857 | 0.2411 | 0.2500 | 0.3917 | 0.2834 |
| auth-qa-2 | 0.7742 | 0.8205 | 0.6957 | 0.7581 | 0.2143 | 0.2857 | 0.2440 | 0.1667 | 0.4722 | 0.2337 |
| auth-qa-3 | 0.7742 | 0.8205 | 0.6957 | 0.7581 | 0.2143 | 0.2857 | 0.2440 | 0.2500 | 0.3917 | 0.2834 |
| auth-qa-4 | 0.7742 | 0.8205 | 0.6957 | 0.7581 | 0.2143 | 0.2857 | 0.2411 | 0.1667 | 0.4722 | 0.2337 |
| MQ-1 | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| MQ-2 | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| MQ-3 | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| MQ-4 | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| MQ-5 | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| Umass_czi_1 | 0.8065 | 0.8235 | 0.7857 | 0.8046 | 0.2500 | 0.3571 | 0.2869 | 0.6806 | 0.4111 | 0.4474 |
| Umass_czi_2 | 0.8387 | 0.8649 | 0.8000 | 0.8324 | 0.2500 | 0.3571 | 0.2869 | 0.6806 | 0.4111 | 0.4474 |
| Umass_czi_4 | 0.9032 | 0.9143 | 0.8889 | 0.9016 | 0.3214 | 0.4643 | 0.3810 | 0.6111 | 0.4028 | 0.4314 |
| Umass_czi_5 | 0.9032 | 0.9189 | 0.8800 | 0.8995 | 0.2500 | 0.4286 | 0.3030 | 0.7361 | 0.4500 | 0.5020 |
| UoT_baseline | 0.5806 | 0.7347 | - | 0.3673 | 0.3214 | 0.3929 | 0.3512 | 0.4861 | 0.3833 | 0.4024 |
| UoT_allquestions | 0.5806 | 0.7347 | - | 0.3673 | 0.3214 | 0.3929 | 0.3423 | 0.5972 | 0.3889 | 0.4151 |
| UoT_multitask_learn | 0.5161 | 0.6809 | - | 0.3404 | 0.3214 | 0.4286 | 0.3643 | 0.5139 | 0.3333 | 0.3530 |
| Best factoid | 0.5806 | 0.7111 | 0.2353 | 0.4732 | 0.2857 | 0.3929 | 0.3333 | 0.5208 | 0.3833 | 0.3917 |
| Best yesno | 0.5161 | 0.6809 | - | 0.3404 | 0.3214 | 0.4286 | 0.3643 | 0.5139 | 0.3333 | 0.3530 |
| BJUTNLPGroup | 0.5806 | 0.7347 | - | 0.3673 | 0.2500 | 0.3571 | 0.3036 | 0.1583 | 0.4222 | 0.2170 |
| Multitask SBERT Cls | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| Multitask SBERT reg | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| sbert cls | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| sbert 1 epoch cls | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| GNN | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| factoid qa model | 0.4194 | - | 0.5909 | 0.2955 | 0.3214 | 0.4643 | 0.3750 | - | - | - |
| simple truncation | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| kmeans | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| similarity measures | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| abstractive | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| extractive | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| pa-base | 0.9032 | 0.9189 | 0.8800 | 0.8995 | 0.2500 | 0.4643 | 0.3137 | 0.5278 | 0.4444 | 0.4377 |
| pa | 0.9032 | 0.9189 | 0.8800 | 0.8995 | 0.2500 | 0.4643 | 0.3137 | 0.5278 | 0.4444 | 0.4377 |
| KoreaUniv-DMIS-1 | 0.9032 | 0.9091 | 0.8966 | 0.9028 | 0.3214 | 0.4286 | 0.3601 | 0.6583 | 0.4111 | 0.4312 |
| KoreaUniv-DMIS-4 | 0.8387 | 0.8571 | 0.8148 | 0.8360 | 0.2857 | 0.4286 | 0.3357 | 0.6167 | 0.4111 | 0.4282 |
| KoreaUniv-DMIS-2 | 0.8710 | 0.8824 | 0.8571 | 0.8697 | 0.3214 | 0.4286 | 0.3446 | 0.6028 | 0.4111 | 0.4259 |
| KoreaUniv-DMIS-3 | 0.8387 | 0.8571 | 0.8148 | 0.8360 | 0.2500 | 0.4643 | 0.3357 | 0.6111 | 0.4111 | 0.4222 |
| KoreaUniv-DMIS-5 | 0.9032 | 0.9091 | 0.8966 | 0.9028 | 0.3214 | 0.4643 | 0.3565 | 0.6167 | 0.4111 | 0.4282 |
| sbert reg | 0.5806 | 0.7347 | - | 0.3673 | - | - | - | - | - | - |
| FudanLabZhu1 | 0.7419 | 0.8182 | 0.5556 | 0.6869 | 0.2500 | 0.3929 | 0.2976 | 0.4819 | 0.2806 | 0.3112 |
| FudanLabZhu2 | 0.7419 | 0.8182 | 0.5556 | 0.6869 | 0.3214 | 0.5357 | 0.3970 | 0.5694 | 0.3361 | 0.3849 |
| FudanLabZhu3 | 0.7419 | 0.8182 | 0.5556 | 0.6869 | 0.3214 | 0.4643 | 0.3655 | 0.5583 | 0.3361 | 0.3708 |
| FudanLabZhu4 | 0.7419 | 0.8182 | 0.5556 | 0.6869 | 0.2857 | 0.5714 | 0.3821 | 0.5583 | 0.3361 | 0.3708 |
| FudanLabZhu5 | 0.7419 | 0.8182 | 0.5556 | 0.6869 | 0.3214 | 0.4286 | 0.3690 | 0.5583 | 0.3361 | 0.3708 |
| BioASQ_Baseline | 0.5161 | 0.4444 | 0.5714 | 0.5079 | 0.0714 | 0.2143 | 0.1220 | 0.2052 | 0.4611 | 0.2456 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| bio-answerfinder | 0.4557 | 0.3471 | 0.4642 | 0.3450 | 4.16 | 4.25 | 3.93 | 4.53 |
| auth-qa-1 | - | - | - | - | - | - | - | - |
| auth-qa-2 | - | - | - | - | - | - | - | - |
| auth-qa-3 | - | - | - | - | - | - | - | - |
| auth-qa-4 | - | - | - | - | - | - | - | - |
| MQ-1 | 0.5222 | 0.3632 | 0.5336 | 0.3597 | 4.03 | 4.48 | 3.86 | 4.13 |
| MQ-2 | 0.5481 | 0.3719 | 0.5580 | 0.3673 | 4.04 | 4.53 | 3.87 | 4.14 |
| MQ-3 | 0.5394 | 0.3673 | 0.5491 | 0.3633 | 4.06 | 4.54 | 3.89 | 4.17 |
| MQ-4 | 0.5202 | 0.3597 | 0.5336 | 0.3574 | 4.06 | 4.52 | 3.86 | 4.11 |
| MQ-5 | 0.3997 | 0.3055 | 0.4112 | 0.3060 | 4.07 | 4.25 | 3.91 | 4.32 |
| Umass_czi_1 | - | - | - | - | - | - | - | - |
| Umass_czi_2 | - | - | - | - | - | - | - | - |
| Umass_czi_4 | - | - | - | - | - | - | - | - |
| Umass_czi_5 | - | - | - | - | - | - | - | - |
| UoT_baseline | - | - | - | - | - | - | - | - |
| UoT_allquestions | - | - | - | - | - | - | - | - |
| UoT_multitask_learn | - | - | - | - | - | - | - | - |
| Best factoid | - | - | - | - | - | - | - | - |
| Best yesno | - | - | - | - | - | - | - | - |
| BJUTNLPGroup | 0.0293 | 0.0501 | 0.0196 | 0.0337 | 2.99 | 2.39 | 3.20 | 4.35 |
| Multitask SBERT Cls | 0.4812 | 0.3292 | 0.5007 | 0.3305 | 3.91 | 4.24 | 3.73 | 3.98 |
| Multitask SBERT reg | 0.4669 | 0.3166 | 0.4834 | 0.3178 | 3.85 | 4.10 | 3.69 | 4.05 |
| sbert cls | 0.4769 | 0.3177 | 0.4940 | 0.3189 | 3.87 | 4.17 | 3.70 | 4.07 |
| sbert 1 epoch cls | 0.4820 | 0.3225 | 0.4977 | 0.3232 | 3.84 | 4.11 | 3.69 | 4.06 |
| GNN | 0.1538 | 0.1453 | 0.1600 | 0.1439 | 3.31 | 2.70 | 2.98 | 4.20 |
| factoid qa model | - | - | - | - | - | - | - | - |
| simple truncation | 0.4154 | 0.3229 | 0.4256 | 0.3221 | 4.10 | 4.41 | 3.94 | 4.42 |
| kmeans | 0.4154 | 0.3229 | 0.4256 | 0.3221 | 4.10 | 4.41 | 3.94 | 4.42 |
| similarity measures | 0.3961 | 0.2941 | 0.4093 | 0.2959 | 3.98 | 4.09 | 3.65 | 4.31 |
| abstractive | 0.1959 | 0.2186 | 0.1943 | 0.2169 | 3.54 | 3.33 | 3.75 | 4.66 |
| extractive | 0.4519 | 0.3197 | 0.4629 | 0.3155 | 3.90 | 4.43 | 3.84 | 4.18 |
| pa-base | 0.3202 | 0.2896 | 0.3261 | 0.2921 | 4.12 | 4.15 | 3.99 | 4.58 |
| pa | 0.5088 | 0.3229 | 0.5179 | 0.3182 | 4.00 | 4.42 | 3.96 | 4.26 |
| KoreaUniv-DMIS-1 | 0.3213 | 0.2832 | 0.3193 | 0.2787 | 4.55 | 3.95 | 3.99 | 4.70 |
| KoreaUniv-DMIS-4 | 0.2704 | 0.2539 | 0.2701 | 0.2521 | 4.56 | 3.88 | 4.04 | 4.80 |
| KoreaUniv-DMIS-2 | 0.2992 | 0.2809 | 0.2983 | 0.2795 | 4.52 | 4.11 | 4.10 | 4.76 |
| KoreaUniv-DMIS-3 | 0.2854 | 0.2660 | 0.2852 | 0.2634 | 4.52 | 3.97 | 4.09 | 4.70 |
| KoreaUniv-DMIS-5 | 0.2997 | 0.2899 | 0.3043 | 0.2909 | 4.60 | 4.01 | 4.17 | 4.79 |
| sbert reg | 0.4508 | 0.3110 | 0.4647 | 0.3121 | 3.94 | 4.09 | 3.72 | 4.08 |
| FudanLabZhu1 | - | - | - | - | - | - | - | - |
| FudanLabZhu2 | - | - | - | - | - | - | - | - |
| FudanLabZhu3 | - | - | - | - | - | - | - | - |
| FudanLabZhu4 | - | - | - | - | - | - | - | - |
| FudanLabZhu5 | - | - | - | - | - | - | - | - |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |
Test batch 4
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| MQ-2 | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| MQ-3 | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| MQ-4 | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| bio-answerfinder | 0.6538 | 0.7568 | 0.4000 | 0.5784 | 0.4706 | 0.5882 | 0.5245 | 0.2791 | 0.3636 | 0.2646 |
| auth-qa-1 | 0.6154 | 0.7059 | 0.4444 | 0.5752 | 0.2941 | 0.5294 | 0.3794 | 0.3059 | 0.4229 | 0.3406 |
| auth-qa-2 | 0.6154 | 0.7059 | 0.4444 | 0.5752 | 0.2941 | 0.5000 | 0.3647 | 0.1882 | 0.4871 | 0.2561 |
| auth-qa-3 | 0.6154 | 0.7059 | 0.4444 | 0.5752 | 0.2941 | 0.5000 | 0.3647 | 0.3137 | 0.4283 | 0.3459 |
| auth-qa-4 | 0.6154 | 0.7059 | 0.4444 | 0.5752 | 0.2941 | 0.5294 | 0.3794 | 0.1882 | 0.4871 | 0.2561 |
| auth-qa-5 | 0.6154 | 0.6667 | 0.5455 | 0.6061 | 0.2941 | 0.5294 | 0.3794 | 0.1882 | 0.4871 | 0.2561 |
| MQ-1 | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| MQ-5 | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| BJUTNLPGroup | 0.5385 | 0.7000 | - | 0.3500 | 0.3529 | 0.5000 | 0.4167 | 0.1824 | 0.5597 | 0.2620 |
| UoT_baseline | 0.5000 | 0.6667 | - | 0.3333 | 0.4118 | 0.7059 | 0.5270 | 0.3296 | 0.3810 | 0.3161 |
| UoT_allquestions | 0.5000 | 0.6667 | - | 0.3333 | 0.4412 | 0.7059 | 0.5564 | 0.4045 | 0.4623 | 0.3886 |
| Best factoid | 0.5000 | 0.6667 | - | 0.3333 | 0.4706 | 0.7059 | 0.5564 | 0.4582 | 0.4153 | 0.4005 |
| UoT_multitask_learn | 0.4615 | 0.6316 | - | 0.3158 | 0.4706 | 0.6765 | 0.5637 | 0.3843 | 0.3226 | 0.2991 |
| Best yesno | 0.5000 | 0.6667 | - | 0.3333 | 0.4412 | 0.7059 | 0.5564 | 0.4045 | 0.4623 | 0.3886 |
| GNN | 0.5385 | 0.7000 | - | 0.3500 | 0.0294 | 0.2353 | 0.1039 | - | - | - |
| Umass_czi_1 | 0.8077 | 0.8387 | 0.7619 | 0.8003 | 0.3529 | 0.5294 | 0.4186 | 0.5164 | 0.3888 | 0.3774 |
| Umass_czi_2 | 0.6923 | 0.7333 | 0.6364 | 0.6848 | 0.3235 | 0.5000 | 0.3946 | 0.5164 | 0.3888 | 0.3774 |
| Umass_czi_3 | 0.7692 | 0.8125 | 0.7000 | 0.7563 | 0.3235 | 0.5000 | 0.3931 | 0.5753 | 0.4182 | 0.4146 |
| Umass_czi_4 | 0.6538 | 0.7097 | 0.5714 | 0.6406 | 0.2941 | 0.5588 | 0.3946 | 0.5753 | 0.4182 | 0.4146 |
| Umass_czi_5 | 0.6538 | 0.7097 | 0.5714 | 0.6406 | 0.5000 | 0.7059 | 0.5637 | 0.5753 | 0.4182 | 0.4146 |
| factoid qa model | 0.7308 | 0.7879 | 0.6316 | 0.7097 | 0.4412 | 0.6471 | 0.5206 | 0.4039 | 0.1696 | 0.2117 |
| Parameters retrained | 0.7308 | 0.7879 | 0.6316 | 0.7097 | 0.4412 | 0.6765 | 0.5216 | 0.4679 | 0.3443 | 0.3341 |
| Features Fusion | 0.7308 | 0.7879 | 0.6316 | 0.7097 | 0.5000 | 0.6765 | 0.5745 | 0.5428 | 0.3541 | 0.3625 |
| Multitask SBERT Cls | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| Multitask SBERT reg | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| sbert cls | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| sbert reg | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| sbert 1 epoch cls | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| NCU-IISR_1 | 0.7308 | 0.7742 | 0.6667 | 0.7204 | 0.5000 | 0.6765 | 0.5735 | 0.5539 | 0.3786 | 0.3905 |
| dice-a-1.0 | 0.7308 | 0.7879 | 0.6316 | 0.7097 | 0.3824 | 0.6471 | 0.4926 | 0.3627 | 0.1965 | 0.2188 |
| FudanLabZhu1 | 0.5769 | 0.7179 | 0.1538 | 0.4359 | 0.4118 | 0.5588 | 0.4804 | 0.4055 | 0.2970 | 0.3004 |
| FudanLabZhu2 | 0.5769 | 0.7027 | 0.2667 | 0.4847 | 0.5294 | 0.6765 | 0.5980 | 0.5784 | 0.3541 | 0.3902 |
| FudanLabZhu4 | 0.5769 | 0.7027 | 0.2667 | 0.4847 | 0.5000 | 0.6765 | 0.5686 | 0.5375 | 0.5089 | 0.4571 |
| FudanLabZhu3 | 0.5769 | 0.7179 | 0.1538 | 0.4359 | 0.4118 | 0.5882 | 0.4755 | 0.5375 | 0.5089 | 0.4571 |
| GIAO | 0.6538 | 0.7273 | 0.5263 | 0.6268 | 0.5000 | 0.6765 | 0.5784 | 0.6520 | 0.3585 | 0.4101 |
| KoreaUniv-DMIS-1 | 0.7692 | 0.8000 | 0.7273 | 0.7636 | 0.5294 | 0.7059 | 0.6078 | 0.3577 | 0.5539 | 0.4037 |
| KoreaUniv-DMIS-2 | 0.7692 | 0.8000 | 0.7273 | 0.7636 | 0.5294 | 0.6765 | 0.5882 | 0.2760 | 0.4926 | 0.3122 |
| dice-b-1.0 | 0.7308 | 0.7879 | 0.6316 | 0.7097 | 0.3824 | 0.6471 | 0.4926 | 0.3775 | 0.2259 | 0.2384 |
| simple truncation | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| kmeans | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| similarity measures | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| KoreaUniv-DMIS-3 | 0.7692 | 0.7857 | 0.7500 | 0.7679 | 0.5000 | 0.6765 | 0.5706 | 0.3049 | 0.4461 | 0.3346 |
| KoreaUniv-DMIS-4 | 0.8077 | 0.8276 | 0.7826 | 0.8051 | 0.5000 | 0.6471 | 0.5613 | 0.3002 | 0.4926 | 0.3318 |
| extractive | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| abstractive | 0.5385 | 0.7000 | - | 0.3500 | - | - | - | - | - | - |
| FudanLabZhu5 | 0.5769 | 0.7179 | 0.1538 | 0.4359 | 0.5588 | 0.7353 | 0.6284 | 0.5375 | 0.5089 | 0.4571 |
| KoreaUniv-DMIS-5 | 0.8462 | 0.8571 | 0.8333 | 0.8452 | 0.4706 | 0.7059 | 0.5686 | 0.3630 | 0.4329 | 0.3355 |
| pa | 0.7308 | 0.7879 | 0.6316 | 0.7097 | 0.4706 | 0.5588 | 0.5098 | 0.3571 | 0.3661 | 0.3030 |
| pa-base | 0.7308 | 0.7879 | 0.6316 | 0.7097 | 0.4706 | 0.5588 | 0.5098 | 0.3571 | 0.3661 | 0.3030 |
| BioASQ_Baseline | 0.5385 | 0.4000 | 0.6250 | 0.5125 | 0.0588 | 0.2059 | 0.1078 | 0.1554 | 0.4519 | 0.2122 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| MQ-2 | 0.5054 | 0.2949 | 0.5122 | 0.2878 | 3.96 | 4.53 | 3.69 | 4.09 |
| MQ-3 | 0.5162 | 0.2964 | 0.5220 | 0.2891 | 3.96 | 4.47 | 3.71 | 4.11 |
| MQ-4 | 0.4915 | 0.2886 | 0.5015 | 0.2832 | 4.00 | 4.42 | 3.69 | 4.11 |
| bio-answerfinder | 0.4025 | 0.2993 | 0.4001 | 0.2938 | 4.19 | 4.18 | 3.97 | 4.71 |
| auth-qa-1 | - | - | - | - | - | - | - | - |
| auth-qa-2 | - | - | - | - | - | - | - | - |
| auth-qa-3 | - | - | - | - | - | - | - | - |
| auth-qa-4 | - | - | - | - | - | - | - | - |
| auth-qa-5 | - | - | - | - | - | - | - | - |
| MQ-1 | 0.4971 | 0.3088 | 0.5074 | 0.3017 | 3.82 | 4.54 | 3.81 | 4.19 |
| MQ-5 | 0.3896 | 0.2674 | 0.4005 | 0.2638 | 4.02 | 4.15 | 3.86 | 4.36 |
| BJUTNLPGroup | 0.0288 | 0.0488 | 0.0184 | 0.0316 | 3.00 | 2.69 | 3.37 | 4.58 |
| UoT_baseline | - | - | - | - | - | - | - | - |
| UoT_allquestions | - | - | - | - | - | - | - | - |
| Best factoid | - | - | - | - | - | - | - | - |
| UoT_multitask_learn | - | - | - | - | - | - | - | - |
| Best yesno | - | - | - | - | - | - | - | - |
| GNN | 0.1543 | 0.1385 | 0.1538 | 0.1330 | 3.33 | 2.86 | 3.28 | 4.41 |
| Umass_czi_1 | - | - | - | - | - | - | - | - |
| Umass_czi_2 | - | - | - | - | - | - | - | - |
| Umass_czi_3 | - | - | - | - | - | - | - | - |
| Umass_czi_4 | - | - | - | - | - | - | - | - |
| Umass_czi_5 | - | - | - | - | - | - | - | - |
| factoid qa model | - | - | - | - | - | - | - | - |
| Parameters retrained | - | - | - | - | - | - | - | - |
| Features Fusion | - | - | - | - | - | - | - | - |
| Multitask SBERT Cls | 0.4195 | 0.2492 | 0.4282 | 0.2453 | 3.92 | 4.46 | 3.78 | 4.14 |
| Multitask SBERT reg | 0.4301 | 0.2582 | 0.4414 | 0.2538 | 3.89 | 4.32 | 3.68 | 4.10 |
| sbert cls | 0.4294 | 0.2575 | 0.4382 | 0.2534 | 3.91 | 4.44 | 3.77 | 4.13 |
| sbert reg | 0.4294 | 0.2575 | 0.4382 | 0.2534 | 3.91 | 4.44 | 3.77 | 4.13 |
| sbert 1 epoch cls | 0.3996 | 0.2502 | 0.4147 | 0.2482 | 3.91 | 4.37 | 3.74 | 4.11 |
| NCU-IISR_1 | 0.1616 | 0.1845 | 0.1587 | 0.1796 | 3.92 | 3.33 | 3.79 | 4.70 |
| dice-a-1.0 | - | - | - | - | - | - | - | - |
| FudanLabZhu1 | - | - | - | - | - | - | - | - |
| FudanLabZhu2 | - | - | - | - | - | - | - | - |
| FudanLabZhu4 | - | - | - | - | - | - | - | - |
| FudanLabZhu3 | - | - | - | - | - | - | - | - |
| GIAO | - | - | - | - | - | - | - | - |
| KoreaUniv-DMIS-1 | 0.2390 | 0.2136 | 0.2436 | 0.2141 | 4.50 | 3.67 | 3.86 | 4.76 |
| KoreaUniv-DMIS-2 | 0.2423 | 0.2274 | 0.2456 | 0.2287 | 4.55 | 3.86 | 4.05 | 4.79 |
| dice-b-1.0 | - | - | - | - | - | - | - | - |
| simple truncation | 0.4793 | 0.2828 | 0.4929 | 0.2782 | 3.99 | 4.50 | 3.78 | 4.03 |
| kmeans | 0.4793 | 0.2828 | 0.4929 | 0.2782 | 3.99 | 4.50 | 3.78 | 4.03 |
| similarity measures | 0.3862 | 0.2446 | 0.3971 | 0.2394 | 3.97 | 4.19 | 3.82 | 4.27 |
| KoreaUniv-DMIS-3 | 0.2407 | 0.2246 | 0.2424 | 0.2251 | 4.66 | 3.51 | 3.90 | 4.86 |
| KoreaUniv-DMIS-4 | 0.2268 | 0.2061 | 0.2291 | 0.2066 | 4.56 | 3.84 | 3.84 | 4.75 |
| extractive | 0.4668 | 0.2870 | 0.4729 | 0.2796 | 3.91 | 4.46 | 3.72 | 4.15 |
| abstractive | 0.1938 | 0.2080 | 0.1948 | 0.2050 | 3.46 | 3.09 | 3.62 | 4.51 |
| FudanLabZhu5 | - | - | - | - | - | - | - | - |
| KoreaUniv-DMIS-5 | 0.2302 | 0.2126 | 0.2384 | 0.2181 | 4.54 | 3.87 | 3.98 | 4.81 |
| pa | 0.3135 | 0.2904 | 0.3138 | 0.2874 | 3.79 | 3.70 | 3.86 | 4.69 |
| pa-base | 0.5291 | 0.2923 | 0.5321 | 0.2856 | 3.91 | 4.40 | 3.75 | 4.02 |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |
Test batch 5
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| MQ-2 | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| MQ-3 | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| MQ-4 | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| GNN | 0.5588 | 0.7170 | - | 0.3585 | 0.0000 | 0.2813 | 0.0677 | - | - | - |
| zmodel2 | 0.5588 | 0.7170 | - | 0.3585 | 0.0000 | 0.3125 | 0.0995 | - | - | - |
| NCU-IISR_1 | 0.7353 | 0.7429 | 0.7273 | 0.7351 | 0.4688 | 0.7188 | 0.5859 | 0.4514 | 0.2659 | 0.3140 |
| NCU-IISR_2 | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| auth-qa-1 | 0.6765 | 0.6857 | 0.6667 | 0.6762 | 0.2500 | 0.3750 | 0.2995 | 0.1750 | 0.4821 | 0.2386 |
| auth-qa-5 | 0.6471 | 0.7143 | 0.5385 | 0.6264 | 0.2500 | 0.3750 | 0.2995 | 0.1750 | 0.4821 | 0.2386 |
| UoT_baseline | 0.6176 | 0.5185 | 0.6829 | 0.6007 | 0.5000 | 0.6875 | 0.5844 | 0.2242 | 0.1577 | 0.1732 |
| UoT_multitask_learn | 0.5000 | 0.6667 | - | 0.3333 | 0.4063 | 0.7188 | 0.5365 | 0.5938 | 0.3700 | 0.4296 |
| UoT_allquestions | 0.5588 | 0.7170 | - | 0.3585 | 0.4063 | 0.6875 | 0.5063 | 0.3854 | 0.2798 | 0.3082 |
| Best factoid | 0.5588 | 0.7170 | - | 0.3585 | 0.4063 | 0.6563 | 0.5026 | 0.5174 | 0.3631 | 0.4002 |
| Best yesno | 0.5588 | 0.7170 | - | 0.3585 | 0.4063 | 0.6875 | 0.5063 | 0.3854 | 0.2798 | 0.3082 |
| bio-answerfinder | 0.7353 | 0.7568 | 0.7097 | 0.7332 | 0.5000 | 0.5313 | 0.5156 | 0.4745 | 0.4325 | 0.4163 |
| MQ-1 | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| MQ-5 | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| BJUTNLP2 | 0.5588 | 0.7170 | - | 0.3585 | 0.4688 | 0.5625 | 0.5026 | 0.1250 | 0.3790 | 0.1731 |
| Umass_czi_1 | 0.6471 | 0.6471 | 0.6471 | 0.6471 | 0.4688 | 0.7188 | 0.5677 | 0.5139 | 0.2808 | 0.3353 |
| Umass_czi_2 | 0.6176 | 0.6286 | 0.6061 | 0.6173 | 0.5000 | 0.6250 | 0.5417 | 0.5139 | 0.2808 | 0.3353 |
| Umass_czi_3 | 0.7941 | 0.8205 | 0.7586 | 0.7896 | 0.5625 | 0.7188 | 0.6354 | 0.1528 | 0.1230 | 0.1310 |
| Umass_czi_4 | 0.7353 | 0.7692 | 0.6897 | 0.7294 | 0.5313 | 0.6563 | 0.5833 | 0.3750 | 0.1756 | 0.2166 |
| Umass_czi_5 | 0.5882 | 0.6818 | 0.4167 | 0.5492 | 0.4688 | 0.7188 | 0.5604 | 0.5972 | 0.3224 | 0.3909 |
| factoid qa model | 0.7647 | 0.8095 | 0.6923 | 0.7509 | 0.4688 | 0.6563 | 0.5401 | 0.3333 | 0.0923 | 0.1387 |
| Parameters retrained | 0.7647 | 0.8095 | 0.6923 | 0.7509 | 0.4688 | 0.7813 | 0.5938 | 0.5139 | 0.2946 | 0.3492 |
| Features Fusion | 0.7647 | 0.8095 | 0.6923 | 0.7509 | 0.5313 | 0.7500 | 0.6115 | 0.5035 | 0.2808 | 0.3298 |
| BioFusion | 0.7647 | 0.8095 | 0.6923 | 0.7509 | 0.5000 | 0.7188 | 0.5818 | 0.3854 | 0.2589 | 0.2907 |
| BioLabel | 0.7647 | 0.8095 | 0.6923 | 0.7509 | 0.4688 | 0.7188 | 0.5573 | 0.4271 | 0.2798 | 0.3185 |
| dice-a-1.0 | 0.6765 | 0.7179 | 0.6207 | 0.6693 | 0.4375 | 0.6250 | 0.5156 | 0.3750 | 0.1696 | 0.2159 |
| dice-b-1.0 | 0.7941 | 0.8108 | 0.7742 | 0.7925 | 0.5313 | 0.6875 | 0.5885 | 0.4028 | 0.1845 | 0.2313 |
| DAIICT_lex_UMLSgraph | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| DAIICT_QSM_UMLSgraph | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| DAIICT_QSM | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| DAIICT_lex | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| system of teamdaiict | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| FudanLabZhu1 | 0.5000 | 0.5405 | 0.4516 | 0.4961 | 0.4375 | 0.6250 | 0.5036 | 0.4694 | 0.3304 | 0.3641 |
| FudanLabZhu2 | 0.6176 | 0.6977 | 0.4800 | 0.5888 | 0.5000 | 0.7188 | 0.5818 | 0.6458 | 0.4028 | 0.4703 |
| FudanLabZhu4 | 0.6176 | 0.6977 | 0.4800 | 0.5888 | 0.1250 | 0.2813 | 0.1901 | 0.6458 | 0.4028 | 0.4703 |
| FudanLabZhu3 | 0.6176 | 0.6977 | 0.4800 | 0.5888 | 0.4688 | 0.6250 | 0.5313 | 0.5125 | 0.3869 | 0.4092 |
| KoreaUniv-DMIS-1 | 0.8235 | 0.8333 | 0.8125 | 0.8229 | 0.4688 | 0.6250 | 0.5208 | 0.5799 | 0.4812 | 0.5050 |
| Multitask SBERT Cls | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| KoreaUniv-DMIS-2 | 0.8235 | 0.8333 | 0.8125 | 0.8229 | 0.5000 | 0.7188 | 0.5833 | 0.5799 | 0.4812 | 0.5050 |
| Multitask SBERT reg | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| KoreaUniv-DMIS-3 | 0.8235 | 0.8333 | 0.8125 | 0.8229 | 0.4688 | 0.7188 | 0.5661 | 0.5694 | 0.5437 | 0.5222 |
| KoreaUniv-DMIS-4 | 0.7941 | 0.8000 | 0.7879 | 0.7939 | 0.5313 | 0.7188 | 0.6120 | 0.5724 | 0.5437 | 0.5247 |
| KoreaUniv-DMIS-5 | 0.8529 | 0.8649 | 0.8387 | 0.8518 | 0.5000 | 0.6875 | 0.5677 | 0.5465 | 0.5645 | 0.5243 |
| sbert cls | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| sbert 1 epoch cls | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| sbert reg | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| FudanLabZhu5 | 0.6176 | 0.6977 | 0.4800 | 0.5888 | 0.5000 | 0.6563 | 0.5677 | 0.5317 | 0.3879 | 0.4239 |
| dice-c-1.0 | 0.8529 | 0.8571 | 0.8485 | 0.8528 | 0.3125 | 0.5625 | 0.4151 | 0.1250 | 0.0536 | 0.0741 |
| dice-d-1.0 | 0.8529 | 0.8571 | 0.8485 | 0.8528 | 0.3125 | 0.5625 | 0.4151 | 0.1250 | 0.0536 | 0.0741 |
| NCU-IISR_3 | 0.5588 | 0.7170 | - | 0.3585 | - | - | - | - | - | - |
| dice-e-1.0 | 0.8529 | 0.8571 | 0.8485 | 0.8528 | 0.3125 | 0.5625 | 0.4151 | 0.1250 | 0.0536 | 0.0741 |
| pa | 0.8235 | 0.8333 | 0.8125 | 0.8229 | 0.4375 | 0.6250 | 0.5260 | 0.3284 | 0.2679 | 0.2761 |
| GIAO | 0.5588 | 0.7170 | - | 0.3585 | 0.5000 | 0.6563 | 0.5677 | - | - | - |
| BioASQ_Baseline | 0.6176 | 0.5185 | 0.6829 | 0.6007 | 0.1563 | 0.3438 | 0.2266 | 0.2573 | 0.3641 | 0.2581 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| MQ-2 | 0.5105 | 0.3246 | 0.5161 | 0.3171 | 3.88 | 4.47 | 3.69 | 4.01 |
| MQ-3 | 0.5188 | 0.3328 | 0.5214 | 0.3241 | 3.92 | 4.42 | 3.71 | 4.02 |
| MQ-4 | 0.5155 | 0.3311 | 0.5188 | 0.3233 | 3.91 | 4.43 | 3.65 | 4.02 |
| GNN | 0.2192 | 0.1987 | 0.2151 | 0.1889 | 3.33 | 3.12 | 3.42 | 4.57 |
| zmodel2 | 0.1882 | 0.1675 | 0.1855 | 0.1573 | 3.25 | 3.07 | 3.32 | 4.54 |
| NCU-IISR_1 | 0.1634 | 0.1793 | 0.1552 | 0.1692 | 3.64 | 3.14 | 3.55 | 4.71 |
| NCU-IISR_2 | 0.3027 | 0.2842 | 0.3019 | 0.2760 | 4.22 | 4.12 | 4.27 | 4.93 |
| auth-qa-1 | - | - | - | - | - | - | - | - |
| auth-qa-5 | - | - | - | - | - | - | - | - |
| UoT_baseline | - | - | - | - | - | - | - | - |
| UoT_multitask_learn | - | - | - | - | - | - | - | - |
| UoT_allquestions | - | - | - | - | - | - | - | - |
| Best factoid | - | - | - | - | - | - | - | - |
| Best yesno | - | - | - | - | - | - | - | - |
| bio-answerfinder | 0.4057 | 0.2971 | 0.4021 | 0.2892 | 3.99 | 4.10 | 3.82 | 4.59 |
| MQ-1 | 0.5050 | 0.3154 | 0.5129 | 0.3074 | 3.92 | 4.48 | 3.71 | 4.01 |
| MQ-5 | 0.4069 | 0.3094 | 0.4151 | 0.3051 | 3.93 | 4.26 | 3.80 | 4.37 |
| BJUTNLP2 | 0.0373 | 0.0608 | 0.0244 | 0.0413 | 3.22 | 2.47 | 3.21 | 4.78 |
| Umass_czi_1 | - | - | - | - | - | - | - | - |
| Umass_czi_2 | - | - | - | - | - | - | - | - |
| Umass_czi_3 | - | - | - | - | - | - | - | - |
| Umass_czi_4 | - | - | - | - | - | - | - | - |
| Umass_czi_5 | - | - | - | - | - | - | - | - |
| factoid qa model | - | - | - | - | - | - | - | - |
| Parameters retrained | - | - | - | - | - | - | - | - |
| Features Fusion | - | - | - | - | - | - | - | - |
| BioFusion | - | - | - | - | - | - | - | - |
| BioLabel | - | - | - | - | - | - | - | - |
| dice-a-1.0 | - | - | - | - | - | - | - | - |
| dice-b-1.0 | - | - | - | - | - | - | - | - |
| DAIICT_lex_UMLSgraph | 0.6250 | 0.3189 | 0.6234 | 0.3059 | 3.75 | 4.76 | 3.57 | 3.73 |
| DAIICT_QSM_UMLSgraph | 0.6250 | 0.3189 | 0.6234 | 0.3059 | 3.75 | 4.76 | 3.57 | 3.73 |
| DAIICT_QSM | 0.6428 | 0.3284 | 0.6392 | 0.3138 | 3.70 | 4.83 | 3.57 | 3.68 |
| DAIICT_lex | 0.6257 | 0.3205 | 0.6239 | 0.3073 | 3.75 | 4.74 | 3.57 | 3.72 |
| system of teamdaiict | 0.6473 | 0.3245 | 0.6445 | 0.3100 | 3.67 | 4.81 | 3.59 | 3.77 |
| FudanLabZhu1 | - | - | - | - | - | - | - | - |
| FudanLabZhu2 | - | - | - | - | - | - | - | - |
| FudanLabZhu4 | - | - | - | - | - | - | - | - |
| FudanLabZhu3 | - | - | - | - | - | - | - | - |
| KoreaUniv-DMIS-1 | 0.2374 | 0.2136 | 0.2446 | 0.2158 | 4.24 | 3.43 | 3.71 | 4.65 |
| Multitask SBERT Cls | 0.4946 | 0.3181 | 0.4937 | 0.3085 | 4.01 | 4.41 | 3.84 | 4.09 |
| KoreaUniv-DMIS-2 | 0.2637 | 0.2608 | 0.2630 | 0.2565 | 4.45 | 4.11 | 4.14 | 4.81 |
| Multitask SBERT reg | 0.4860 | 0.3083 | 0.4854 | 0.2992 | 3.99 | 4.42 | 3.82 | 4.09 |
| KoreaUniv-DMIS-3 | 0.2244 | 0.2123 | 0.2282 | 0.2115 | 4.28 | 3.56 | 3.65 | 4.67 |
| KoreaUniv-DMIS-4 | 0.2341 | 0.2230 | 0.2396 | 0.2260 | 4.43 | 4.01 | 3.94 | 4.74 |
| KoreaUniv-DMIS-5 | 0.2369 | 0.2416 | 0.2350 | 0.2378 | 4.41 | 4.14 | 4.05 | 4.74 |
| sbert cls | 0.4948 | 0.3172 | 0.4926 | 0.3070 | 4.01 | 4.43 | 3.84 | 4.06 |
| sbert 1 epoch cls | 0.4967 | 0.3195 | 0.4962 | 0.3098 | 4.01 | 4.33 | 3.78 | 4.11 |
| sbert reg | 0.4896 | 0.3136 | 0.4886 | 0.3041 | 4.01 | 4.42 | 3.83 | 4.09 |
| FudanLabZhu5 | - | - | - | - | - | - | - | - |
| dice-c-1.0 | - | - | - | - | - | - | - | - |
| dice-d-1.0 | - | - | - | - | - | - | - | - |
| NCU-IISR_3 | 0.3698 | 0.3538 | 0.3603 | 0.3396 | 4.33 | 4.12 | 4.21 | 4.95 |
| dice-e-1.0 | - | - | - | - | - | - | - | - |
| pa | 0.3673 | 0.2963 | 0.3634 | 0.2860 | 3.97 | 3.75 | 3.87 | 4.52 |
| GIAO | - | - | - | - | - | - | - | - |
| BioASQ_Baseline | - | - | - | - | - | - | - | - |