BioASQ Participants Area
Task 8b: Test Results of Phase B
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task B are presented
here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
pa-base |
0.6800 |
0.8095 |
- |
0.4048 |
0.2188 |
0.5000 |
0.3542 |
0.2750 |
0.2250 |
0.2305 |
pa |
0.6800 |
0.8095 |
- |
0.4048 |
0.2188 |
0.5000 |
0.3542 |
0.3884 |
0.5629 |
0.4315 |
AUEB-System1 |
0.6800 |
0.8095 |
- |
0.4048 |
0.0313 |
0.0625 |
0.0469 |
0.0333 |
0.0417 |
0.0367 |
AUEB-System3 |
0.6800 |
0.8095 |
- |
0.4048 |
0.0000 |
0.0313 |
0.0078 |
0.0200 |
0.0333 |
0.0250 |
AUEB-System4 |
0.6800 |
0.8095 |
- |
0.4048 |
0.0313 |
0.0625 |
0.0469 |
0.0500 |
0.0750 |
0.0597 |
AUEB-System5 |
0.6800 |
0.8095 |
- |
0.4048 |
0.0313 |
0.1563 |
0.0714 |
0.1008 |
0.1308 |
0.1095 |
UoT_baseline |
0.6800 |
0.8095 |
- |
0.4048 |
0.3125 |
0.5938 |
0.4266 |
0.3461 |
0.2933 |
0.2918 |
UoT_allquestions |
0.6800 |
0.8095 |
- |
0.4048 |
0.2813 |
0.5938 |
0.4099 |
0.3842 |
0.3200 |
0.3262 |
BJUTNLPGroup |
0.6800 |
0.8095 |
- |
0.4048 |
0.2813 |
0.4063 |
0.3307 |
0.1800 |
0.5458 |
0.2652 |
MQ-5 |
0.6800 |
0.8095 |
- |
0.4048 |
- | - | - |
- | - | - |
MQ-1 |
0.6800 |
0.8095 |
- |
0.4048 |
- | - | - |
- | - | - |
MQ-2 |
0.6800 |
0.8095 |
- |
0.4048 |
- | - | - |
- | - | - |
MQ-3 |
0.6800 |
0.8095 |
- |
0.4048 |
- | - | - |
- | - | - |
bio-answerfinder |
0.6800 |
0.8095 |
- |
0.4048 |
0.0625 |
0.2188 |
0.1406 |
0.1511 |
0.3671 |
0.1673 |
Umass_czi_1 |
0.6800 |
0.7778 |
0.4286 |
0.6032 |
0.3750 |
0.5938 |
0.4688 |
- | - | - |
MQ-4 |
0.6800 |
0.8095 |
- |
0.4048 |
- | - | - |
- | - | - |
Umass_czi_2 |
0.5600 |
0.6857 |
0.2667 |
0.4762 |
0.2500 |
0.3438 |
0.2891 |
0.4875 |
0.2983 |
0.3448 |
Umass_czi_3 |
0.6800 |
0.7778 |
0.4286 |
0.6032 |
0.2500 |
0.3438 |
0.2891 |
0.4875 |
0.2983 |
0.3448 |
Umass_czi_4 |
0.6400 |
0.7273 |
0.4706 |
0.5989 |
0.2188 |
0.4375 |
0.3005 |
0.4875 |
0.2983 |
0.3448 |
NCU-IISR_1 |
0.7600 |
0.8235 |
0.6250 |
0.7243 |
- | - | - |
- | - | - |
Umass_czi_5 |
0.6400 |
0.7429 |
0.4000 |
0.5714 |
0.2188 |
0.4375 |
0.3005 |
0.1350 |
0.0767 |
0.0967 |
FudanLabZhu1 |
0.6000 |
0.7368 |
0.1667 |
0.4518 |
0.3750 |
0.5938 |
0.4557 |
0.4250 |
0.3067 |
0.3408 |
FudanLabZhu4 |
0.6000 |
0.7368 |
0.1667 |
0.4518 |
0.3125 |
0.5938 |
0.4219 |
0.4250 |
0.3067 |
0.3408 |
auth-qa-1 |
0.6800 |
0.8095 |
- |
0.4048 |
0.0625 |
0.1563 |
0.1094 |
0.0325 |
0.0583 |
0.0417 |
kmeans |
- |
- |
- |
- |
- | - | - |
- | - | - |
KoreaUniv-DMIS-2 |
0.8400 |
0.8824 |
0.7500 |
0.8162 |
0.3438 |
0.6250 |
0.4583 |
0.4367 |
0.3100 |
0.3333 |
KoreaUniv-DMIS-3 |
0.8800 |
0.9143 |
0.8000 |
0.8571 |
0.2500 |
0.6250 |
0.3979 |
0.3908 |
0.3100 |
0.3152 |
simple truncation |
- |
- |
- |
- |
- | - | - |
- | - | - |
KoreaUniv-DMIS-1 |
0.7600 |
0.8235 |
0.6250 |
0.7243 |
0.1250 |
0.4375 |
0.2615 |
0.4333 |
0.3425 |
0.3597 |
KoreaUniv-DMIS-4 |
0.7600 |
0.8235 |
0.6250 |
0.7243 |
0.3125 |
0.6250 |
0.4344 |
0.4450 |
0.3325 |
0.3516 |
KoreaUniv-DMIS-5 |
0.8800 |
0.9091 |
0.8235 |
0.8663 |
0.3438 |
0.5938 |
0.4438 |
0.4733 |
0.3633 |
0.3718 |
BioASQ_Baseline |
0.4000 |
0.2857 |
0.4828 |
0.3842 |
0.1563 |
0.2813 |
0.2016 |
0.1322 |
0.3483 |
0.1839 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
pa-base |
0.1040 |
0.1061 |
0.1036 |
0.1058 |
0.94 |
1.04 |
1.07 |
1.10 |
pa |
0.1040 |
0.1061 |
0.1036 |
0.1058 |
0.94 |
1.04 |
1.07 |
1.10 |
AUEB-System1 |
0.0944 |
0.0495 |
0.0978 |
0.0496 |
0.64 |
1.00 |
0.66 |
0.84 |
AUEB-System3 |
0.0563 |
0.0278 |
0.0636 |
0.0312 |
0.62 |
0.78 |
0.65 |
0.94 |
AUEB-System4 |
0.0163 |
0.0056 |
0.0246 |
0.0099 |
0.52 |
0.45 |
0.35 |
0.83 |
AUEB-System5 |
0.0595 |
0.0282 |
0.0659 |
0.0317 |
0.57 |
0.77 |
0.58 |
0.84 |
UoT_baseline |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_allquestions |
- |
- |
- |
- |
- |
- |
- |
- |
BJUTNLPGroup |
0.0794 |
0.0976 |
0.0633 |
0.0731 |
2.46 |
2.51 |
3.08 |
3.68 |
MQ-5 |
0.4616 |
0.3331 |
0.4740 |
0.3295 |
3.85 |
4.23 |
3.88 |
4.20 |
MQ-1 |
0.5322 |
0.3339 |
0.5412 |
0.3276 |
3.63 |
4.37 |
3.83 |
3.95 |
MQ-2 |
0.5671 |
0.3487 |
0.5732 |
0.3417 |
3.62 |
4.33 |
3.78 |
3.94 |
MQ-3 |
0.5789 |
0.3568 |
0.5834 |
0.3481 |
3.63 |
4.33 |
3.83 |
3.92 |
bio-answerfinder |
0.4515 |
0.3289 |
0.4543 |
0.3236 |
3.74 |
3.88 |
3.69 |
4.24 |
Umass_czi_1 |
- |
- |
- |
- |
- |
- |
- |
- |
MQ-4 |
0.5401 |
0.3365 |
0.5492 |
0.3301 |
3.62 |
4.33 |
3.81 |
3.90 |
Umass_czi_2 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_3 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_4 |
- |
- |
- |
- |
- |
- |
- |
- |
NCU-IISR_1 |
0.1699 |
0.1923 |
0.1735 |
0.1935 |
3.52 |
3.26 |
3.59 |
4.46 |
Umass_czi_5 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu1 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu4 |
- |
- |
- |
- |
- |
- |
- |
- |
auth-qa-1 |
- |
- |
- |
- |
- |
- |
- |
- |
kmeans |
0.5566 |
0.3208 |
0.5609 |
0.3129 |
3.33 |
4.35 |
3.70 |
3.92 |
KoreaUniv-DMIS-2 |
- |
- |
- |
- |
- |
- |
- |
- |
KoreaUniv-DMIS-3 |
- |
- |
- |
- |
- |
- |
- |
- |
simple truncation |
0.3841 |
0.3385 |
0.3896 |
0.3339 |
3.93 |
4.07 |
4.13 |
4.62 |
KoreaUniv-DMIS-1 |
0.2326 |
0.1576 |
0.2509 |
0.1651 |
3.79 |
3.50 |
3.49 |
4.37 |
KoreaUniv-DMIS-4 |
0.1957 |
0.1887 |
0.2021 |
0.1898 |
4.09 |
3.40 |
3.60 |
4.54 |
KoreaUniv-DMIS-5 |
0.1960 |
0.1895 |
0.2019 |
0.1902 |
4.07 |
3.37 |
3.61 |
4.53 |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
BJUTNLPGroup |
0.7500 |
0.8571 |
- |
0.4286 |
0.0800 |
0.3200 |
0.1747 |
0.1429 |
0.4619 |
0.2027 |
AUEB-System1 |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
0.0179 |
0.0179 |
0.0179 |
AUEB-System2 |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
0.0143 |
0.0238 |
0.0179 |
AUEB-System3 |
0.7500 |
0.8571 |
- |
0.4286 |
0.0000 |
0.0400 |
0.0100 |
- | - | - |
AUEB-System4 |
0.7500 |
0.8571 |
- |
0.4286 |
0.0000 |
0.0800 |
0.0280 |
0.0429 |
0.0524 |
0.0429 |
AUEB-System5 |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
0.0667 |
0.0821 |
0.0672 |
auth-qa-1 |
0.7778 |
0.8621 |
0.4286 |
0.6453 |
0.0000 |
0.0800 |
0.0233 |
0.0845 |
0.1238 |
0.0944 |
bio-answerfinder |
0.7778 |
0.8621 |
0.4286 |
0.6453 |
0.2000 |
0.2400 |
0.2080 |
0.3693 |
0.4714 |
0.3803 |
UoT_baseline |
0.7500 |
0.8571 |
- |
0.4286 |
0.1600 |
0.4400 |
0.2580 |
0.5361 |
0.4476 |
0.4306 |
Best yesno |
0.7500 |
0.8571 |
- |
0.4286 |
0.1600 |
0.4400 |
0.2580 |
0.5361 |
0.4476 |
0.4306 |
UoT_multitask_learn |
0.8333 |
0.9000 |
0.5000 |
0.7000 |
0.2000 |
0.4000 |
0.2800 |
0.4643 |
0.4214 |
0.4108 |
UoT_allquestions |
0.7778 |
0.8710 |
0.2000 |
0.5355 |
0.1600 |
0.4800 |
0.2540 |
0.4107 |
0.3738 |
0.3712 |
MQ-1 |
- |
- |
- |
- |
- | - | - |
- | - | - |
Best factoid |
0.7500 |
0.8571 |
- |
0.4286 |
0.1200 |
0.4400 |
0.2413 |
0.4827 |
0.4071 |
0.3950 |
Umass_czi_2 |
0.7778 |
0.8519 |
0.5556 |
0.7037 |
0.0800 |
0.2400 |
0.1333 |
- | - | - |
Umass_czi_1 |
0.7222 |
0.8148 |
0.4444 |
0.6296 |
0.1200 |
0.2000 |
0.1480 |
- | - | - |
NCU-IISR_1 |
0.7778 |
0.8519 |
0.5556 |
0.7037 |
0.2800 |
0.4400 |
0.3293 |
0.3214 |
0.2381 |
0.2667 |
MQ-2 |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
- | - | - |
MQ-3 |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
- | - | - |
MQ-4 |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
- | - | - |
simple truncation |
- |
- |
- |
- |
- | - | - |
- | - | - |
MQ-5 |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
- | - | - |
Multitask SBERT Cls |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
- | - | - |
Multitask SBERT reg |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
- | - | - |
sbert cls |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
- | - | - |
sbert reg |
0.7500 |
0.8571 |
- |
0.4286 |
- | - | - |
- | - | - |
pa-base |
0.2500 |
- |
0.4000 |
0.2000 |
0.1200 |
0.4000 |
0.2300 |
0.1012 |
0.0619 |
0.0702 |
FudanLabZhu1 |
0.6944 |
0.7843 |
0.4762 |
0.6303 |
0.2800 |
0.4000 |
0.3200 |
0.3631 |
0.2976 |
0.3112 |
FudanLabZhu3 |
0.6944 |
0.7843 |
0.4762 |
0.6303 |
0.2000 |
0.3200 |
0.2413 |
0.5417 |
0.5024 |
0.4678 |
kmeans |
- |
- |
- |
- |
- | - | - |
- | - | - |
FudanLabZhu4 |
0.6944 |
0.7843 |
0.4762 |
0.6303 |
0.2400 |
0.3600 |
0.2900 |
0.5417 |
0.5024 |
0.4678 |
KoreaUniv-DMIS-1 |
0.9444 |
0.9630 |
0.8889 |
0.9259 |
0.1200 |
0.3200 |
0.1967 |
0.5643 |
0.4643 |
0.4735 |
KoreaUniv-DMIS-2 |
0.9167 |
0.9434 |
0.8421 |
0.8928 |
0.1600 |
0.3600 |
0.2367 |
0.5060 |
0.3667 |
0.4029 |
KoreaUniv-DMIS-5 |
0.8889 |
0.9259 |
0.7778 |
0.8519 |
0.1600 |
0.3600 |
0.2193 |
0.4821 |
0.3667 |
0.3985 |
KoreaUniv-DMIS-3 |
0.9444 |
0.9630 |
0.8889 |
0.9259 |
0.1200 |
0.3200 |
0.2000 |
0.5500 |
0.4214 |
0.4544 |
KoreaUniv-DMIS-4 |
0.9167 |
0.9434 |
0.8421 |
0.8928 |
0.2800 |
0.4400 |
0.3533 |
0.4881 |
0.3381 |
0.3798 |
pa |
0.2500 |
- |
0.4000 |
0.2000 |
0.1200 |
0.4400 |
0.2360 |
0.3418 |
0.4881 |
0.3441 |
BioASQ_Baseline |
0.3611 |
0.3784 |
0.3429 |
0.3606 |
0.0800 |
0.1200 |
0.1000 |
0.2196 |
0.5929 |
0.2858 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
BJUTNLPGroup |
0.0325 |
0.0555 |
0.0223 |
0.0389 |
3.07 |
2.37 |
3.32 |
4.53 |
AUEB-System1 |
0.0551 |
0.0331 |
0.0590 |
0.0338 |
0.75 |
0.98 |
0.76 |
0.82 |
AUEB-System2 |
0.0491 |
0.0289 |
0.0550 |
0.0309 |
0.72 |
0.99 |
0.75 |
0.84 |
AUEB-System3 |
0.0551 |
0.0331 |
0.0590 |
0.0338 |
0.75 |
0.98 |
0.76 |
0.82 |
AUEB-System4 |
0.2125 |
0.1151 |
0.2113 |
0.1128 |
0.86 |
1.22 |
0.92 |
0.84 |
AUEB-System5 |
0.2125 |
0.1151 |
0.2113 |
0.1128 |
0.86 |
1.22 |
0.92 |
0.84 |
auth-qa-1 |
- |
- |
- |
- |
- |
- |
- |
- |
bio-answerfinder |
0.4150 |
0.2997 |
0.4231 |
0.2981 |
3.88 |
4.20 |
3.94 |
4.45 |
UoT_baseline |
- |
- |
- |
- |
- |
- |
- |
- |
Best yesno |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_multitask_learn |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_allquestions |
- |
- |
- |
- |
- |
- |
- |
- |
MQ-1 |
0.5205 |
0.3040 |
0.5243 |
0.2939 |
3.76 |
4.42 |
3.67 |
3.76 |
Best factoid |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_2 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_1 |
- |
- |
- |
- |
- |
- |
- |
- |
NCU-IISR_1 |
0.1560 |
0.1841 |
0.1547 |
0.1783 |
3.93 |
3.55 |
4.01 |
4.76 |
MQ-2 |
0.5244 |
0.3254 |
0.5339 |
0.3206 |
3.71 |
4.49 |
3.78 |
3.92 |
MQ-3 |
0.5232 |
0.3206 |
0.5314 |
0.3153 |
3.77 |
4.51 |
3.84 |
3.92 |
MQ-4 |
0.4782 |
0.3012 |
0.4885 |
0.2975 |
3.73 |
4.40 |
3.75 |
3.90 |
simple truncation |
0.5201 |
0.3011 |
0.5228 |
0.2906 |
3.73 |
4.50 |
3.67 |
3.74 |
MQ-5 |
0.3794 |
0.2683 |
0.3885 |
0.2651 |
3.88 |
4.31 |
3.84 |
4.15 |
Multitask SBERT Cls |
0.4701 |
0.2992 |
0.4813 |
0.2947 |
3.72 |
4.56 |
3.80 |
3.98 |
Multitask SBERT reg |
0.4649 |
0.2938 |
0.4747 |
0.2898 |
3.77 |
4.50 |
3.80 |
4.01 |
sbert cls |
0.4806 |
0.2993 |
0.4915 |
0.2954 |
3.72 |
4.52 |
3.80 |
3.95 |
sbert reg |
0.4593 |
0.2904 |
0.4784 |
0.2918 |
3.75 |
4.39 |
3.78 |
4.00 |
pa-base |
0.0592 |
0.0653 |
0.0590 |
0.0642 |
1.08 |
0.98 |
1.03 |
1.19 |
FudanLabZhu1 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu3 |
- |
- |
- |
- |
- |
- |
- |
- |
kmeans |
0.5138 |
0.2891 |
0.5281 |
0.2845 |
3.69 |
4.41 |
3.63 |
3.63 |
FudanLabZhu4 |
- |
- |
- |
- |
- |
- |
- |
- |
KoreaUniv-DMIS-1 |
0.2387 |
0.2287 |
0.2486 |
0.2311 |
4.50 |
3.81 |
4.06 |
4.71 |
KoreaUniv-DMIS-2 |
0.2546 |
0.2442 |
0.2567 |
0.2413 |
4.51 |
4.11 |
4.27 |
4.77 |
KoreaUniv-DMIS-5 |
0.2402 |
0.2318 |
0.2486 |
0.2335 |
4.52 |
3.78 |
4.03 |
4.71 |
KoreaUniv-DMIS-3 |
0.2662 |
0.2553 |
0.2687 |
0.2527 |
4.53 |
4.16 |
4.27 |
4.76 |
KoreaUniv-DMIS-4 |
0.2363 |
0.2272 |
0.2494 |
0.2331 |
4.53 |
3.83 |
4.02 |
4.76 |
pa |
0.0592 |
0.0653 |
0.0590 |
0.0642 |
1.08 |
0.98 |
1.03 |
1.19 |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
bio-answerfinder |
0.8710 |
0.8947 |
0.8333 |
0.8640 |
0.3214 |
0.4286 |
0.3494 |
0.3884 |
0.4972 |
0.3983 |
auth-qa-1 |
0.7742 |
0.8205 |
0.6957 |
0.7581 |
0.2143 |
0.2857 |
0.2411 |
0.2500 |
0.3917 |
0.2834 |
auth-qa-2 |
0.7742 |
0.8205 |
0.6957 |
0.7581 |
0.2143 |
0.2857 |
0.2440 |
0.1667 |
0.4722 |
0.2337 |
auth-qa-3 |
0.7742 |
0.8205 |
0.6957 |
0.7581 |
0.2143 |
0.2857 |
0.2440 |
0.2500 |
0.3917 |
0.2834 |
auth-qa-4 |
0.7742 |
0.8205 |
0.6957 |
0.7581 |
0.2143 |
0.2857 |
0.2411 |
0.1667 |
0.4722 |
0.2337 |
MQ-1 |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
MQ-2 |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
MQ-3 |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
MQ-4 |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
MQ-5 |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
Umass_czi_1 |
0.8065 |
0.8235 |
0.7857 |
0.8046 |
0.2500 |
0.3571 |
0.2869 |
0.6806 |
0.4111 |
0.4474 |
Umass_czi_2 |
0.8387 |
0.8649 |
0.8000 |
0.8324 |
0.2500 |
0.3571 |
0.2869 |
0.6806 |
0.4111 |
0.4474 |
Umass_czi_4 |
0.9032 |
0.9143 |
0.8889 |
0.9016 |
0.3214 |
0.4643 |
0.3810 |
0.6111 |
0.4028 |
0.4314 |
Umass_czi_5 |
0.9032 |
0.9189 |
0.8800 |
0.8995 |
0.2500 |
0.4286 |
0.3030 |
0.7361 |
0.4500 |
0.5020 |
UoT_baseline |
0.5806 |
0.7347 |
- |
0.3673 |
0.3214 |
0.3929 |
0.3512 |
0.4861 |
0.3833 |
0.4024 |
UoT_allquestions |
0.5806 |
0.7347 |
- |
0.3673 |
0.3214 |
0.3929 |
0.3423 |
0.5972 |
0.3889 |
0.4151 |
UoT_multitask_learn |
0.5161 |
0.6809 |
- |
0.3404 |
0.3214 |
0.4286 |
0.3643 |
0.5139 |
0.3333 |
0.3530 |
Best factoid |
0.5806 |
0.7111 |
0.2353 |
0.4732 |
0.2857 |
0.3929 |
0.3333 |
0.5208 |
0.3833 |
0.3917 |
Best yesno |
0.5161 |
0.6809 |
- |
0.3404 |
0.3214 |
0.4286 |
0.3643 |
0.5139 |
0.3333 |
0.3530 |
BJUTNLPGroup |
0.5806 |
0.7347 |
- |
0.3673 |
0.2500 |
0.3571 |
0.3036 |
0.1583 |
0.4222 |
0.2170 |
Multitask SBERT Cls |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
Multitask SBERT reg |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
sbert cls |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
sbert 1 epoch cls |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
GNN |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
factoid qa model |
0.4194 |
- |
0.5909 |
0.2955 |
0.3214 |
0.4643 |
0.3750 |
- | - | - |
simple truncation |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
kmeans |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
similarity measures |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
abstractive |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
extractive |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
pa-base |
0.9032 |
0.9189 |
0.8800 |
0.8995 |
0.2500 |
0.4643 |
0.3137 |
0.5278 |
0.4444 |
0.4377 |
pa |
0.9032 |
0.9189 |
0.8800 |
0.8995 |
0.2500 |
0.4643 |
0.3137 |
0.5278 |
0.4444 |
0.4377 |
KoreaUniv-DMIS-1 |
0.9032 |
0.9091 |
0.8966 |
0.9028 |
0.3214 |
0.4286 |
0.3601 |
0.6583 |
0.4111 |
0.4312 |
KoreaUniv-DMIS-4 |
0.8387 |
0.8571 |
0.8148 |
0.8360 |
0.2857 |
0.4286 |
0.3357 |
0.6167 |
0.4111 |
0.4282 |
KoreaUniv-DMIS-2 |
0.8710 |
0.8824 |
0.8571 |
0.8697 |
0.3214 |
0.4286 |
0.3446 |
0.6028 |
0.4111 |
0.4259 |
KoreaUniv-DMIS-3 |
0.8387 |
0.8571 |
0.8148 |
0.8360 |
0.2500 |
0.4643 |
0.3357 |
0.6111 |
0.4111 |
0.4222 |
KoreaUniv-DMIS-5 |
0.9032 |
0.9091 |
0.8966 |
0.9028 |
0.3214 |
0.4643 |
0.3565 |
0.6167 |
0.4111 |
0.4282 |
sbert reg |
0.5806 |
0.7347 |
- |
0.3673 |
- | - | - |
- | - | - |
FudanLabZhu1 |
0.7419 |
0.8182 |
0.5556 |
0.6869 |
0.2500 |
0.3929 |
0.2976 |
0.4819 |
0.2806 |
0.3112 |
FudanLabZhu2 |
0.7419 |
0.8182 |
0.5556 |
0.6869 |
0.3214 |
0.5357 |
0.3970 |
0.5694 |
0.3361 |
0.3849 |
FudanLabZhu3 |
0.7419 |
0.8182 |
0.5556 |
0.6869 |
0.3214 |
0.4643 |
0.3655 |
0.5583 |
0.3361 |
0.3708 |
FudanLabZhu4 |
0.7419 |
0.8182 |
0.5556 |
0.6869 |
0.2857 |
0.5714 |
0.3821 |
0.5583 |
0.3361 |
0.3708 |
FudanLabZhu5 |
0.7419 |
0.8182 |
0.5556 |
0.6869 |
0.3214 |
0.4286 |
0.3690 |
0.5583 |
0.3361 |
0.3708 |
BioASQ_Baseline |
0.5161 |
0.4444 |
0.5714 |
0.5079 |
0.0714 |
0.2143 |
0.1220 |
0.2052 |
0.4611 |
0.2456 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
bio-answerfinder |
0.4557 |
0.3471 |
0.4642 |
0.3450 |
4.16 |
4.25 |
3.93 |
4.53 |
auth-qa-1 |
- |
- |
- |
- |
- |
- |
- |
- |
auth-qa-2 |
- |
- |
- |
- |
- |
- |
- |
- |
auth-qa-3 |
- |
- |
- |
- |
- |
- |
- |
- |
auth-qa-4 |
- |
- |
- |
- |
- |
- |
- |
- |
MQ-1 |
0.5222 |
0.3632 |
0.5336 |
0.3597 |
4.03 |
4.48 |
3.86 |
4.13 |
MQ-2 |
0.5481 |
0.3719 |
0.5580 |
0.3673 |
4.04 |
4.53 |
3.87 |
4.14 |
MQ-3 |
0.5394 |
0.3673 |
0.5491 |
0.3633 |
4.06 |
4.54 |
3.89 |
4.17 |
MQ-4 |
0.5202 |
0.3597 |
0.5336 |
0.3574 |
4.06 |
4.52 |
3.86 |
4.11 |
MQ-5 |
0.3997 |
0.3055 |
0.4112 |
0.3060 |
4.07 |
4.25 |
3.91 |
4.32 |
Umass_czi_1 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_2 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_4 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_5 |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_baseline |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_allquestions |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_multitask_learn |
- |
- |
- |
- |
- |
- |
- |
- |
Best factoid |
- |
- |
- |
- |
- |
- |
- |
- |
Best yesno |
- |
- |
- |
- |
- |
- |
- |
- |
BJUTNLPGroup |
0.0293 |
0.0501 |
0.0196 |
0.0337 |
2.99 |
2.39 |
3.20 |
4.35 |
Multitask SBERT Cls |
0.4812 |
0.3292 |
0.5007 |
0.3305 |
3.91 |
4.24 |
3.73 |
3.98 |
Multitask SBERT reg |
0.4669 |
0.3166 |
0.4834 |
0.3178 |
3.85 |
4.10 |
3.69 |
4.05 |
sbert cls |
0.4769 |
0.3177 |
0.4940 |
0.3189 |
3.87 |
4.17 |
3.70 |
4.07 |
sbert 1 epoch cls |
0.4820 |
0.3225 |
0.4977 |
0.3232 |
3.84 |
4.11 |
3.69 |
4.06 |
GNN |
0.1538 |
0.1453 |
0.1600 |
0.1439 |
3.31 |
2.70 |
2.98 |
4.20 |
factoid qa model |
- |
- |
- |
- |
- |
- |
- |
- |
simple truncation |
0.4154 |
0.3229 |
0.4256 |
0.3221 |
4.10 |
4.41 |
3.94 |
4.42 |
kmeans |
0.4154 |
0.3229 |
0.4256 |
0.3221 |
4.10 |
4.41 |
3.94 |
4.42 |
similarity measures |
0.3961 |
0.2941 |
0.4093 |
0.2959 |
3.98 |
4.09 |
3.65 |
4.31 |
abstractive |
0.1959 |
0.2186 |
0.1943 |
0.2169 |
3.54 |
3.33 |
3.75 |
4.66 |
extractive |
0.4519 |
0.3197 |
0.4629 |
0.3155 |
3.90 |
4.43 |
3.84 |
4.18 |
pa-base |
0.3202 |
0.2896 |
0.3261 |
0.2921 |
4.12 |
4.15 |
3.99 |
4.58 |
pa |
0.5088 |
0.3229 |
0.5179 |
0.3182 |
4.00 |
4.42 |
3.96 |
4.26 |
KoreaUniv-DMIS-1 |
0.3213 |
0.2832 |
0.3193 |
0.2787 |
4.55 |
3.95 |
3.99 |
4.70 |
KoreaUniv-DMIS-4 |
0.2704 |
0.2539 |
0.2701 |
0.2521 |
4.56 |
3.88 |
4.04 |
4.80 |
KoreaUniv-DMIS-2 |
0.2992 |
0.2809 |
0.2983 |
0.2795 |
4.52 |
4.11 |
4.10 |
4.76 |
KoreaUniv-DMIS-3 |
0.2854 |
0.2660 |
0.2852 |
0.2634 |
4.52 |
3.97 |
4.09 |
4.70 |
KoreaUniv-DMIS-5 |
0.2997 |
0.2899 |
0.3043 |
0.2909 |
4.60 |
4.01 |
4.17 |
4.79 |
sbert reg |
0.4508 |
0.3110 |
0.4647 |
0.3121 |
3.94 |
4.09 |
3.72 |
4.08 |
FudanLabZhu1 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu2 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu3 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu4 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu5 |
- |
- |
- |
- |
- |
- |
- |
- |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |
Test batch 4
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
MQ-2 |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
MQ-3 |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
MQ-4 |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
bio-answerfinder |
0.6538 |
0.7568 |
0.4000 |
0.5784 |
0.4706 |
0.5882 |
0.5245 |
0.2791 |
0.3636 |
0.2646 |
auth-qa-1 |
0.6154 |
0.7059 |
0.4444 |
0.5752 |
0.2941 |
0.5294 |
0.3794 |
0.3059 |
0.4229 |
0.3406 |
auth-qa-2 |
0.6154 |
0.7059 |
0.4444 |
0.5752 |
0.2941 |
0.5000 |
0.3647 |
0.1882 |
0.4871 |
0.2561 |
auth-qa-3 |
0.6154 |
0.7059 |
0.4444 |
0.5752 |
0.2941 |
0.5000 |
0.3647 |
0.3137 |
0.4283 |
0.3459 |
auth-qa-4 |
0.6154 |
0.7059 |
0.4444 |
0.5752 |
0.2941 |
0.5294 |
0.3794 |
0.1882 |
0.4871 |
0.2561 |
auth-qa-5 |
0.6154 |
0.6667 |
0.5455 |
0.6061 |
0.2941 |
0.5294 |
0.3794 |
0.1882 |
0.4871 |
0.2561 |
MQ-1 |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
MQ-5 |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
BJUTNLPGroup |
0.5385 |
0.7000 |
- |
0.3500 |
0.3529 |
0.5000 |
0.4167 |
0.1824 |
0.5597 |
0.2620 |
UoT_baseline |
0.5000 |
0.6667 |
- |
0.3333 |
0.4118 |
0.7059 |
0.5270 |
0.3296 |
0.3810 |
0.3161 |
UoT_allquestions |
0.5000 |
0.6667 |
- |
0.3333 |
0.4412 |
0.7059 |
0.5564 |
0.4045 |
0.4623 |
0.3886 |
Best factoid |
0.5000 |
0.6667 |
- |
0.3333 |
0.4706 |
0.7059 |
0.5564 |
0.4582 |
0.4153 |
0.4005 |
UoT_multitask_learn |
0.4615 |
0.6316 |
- |
0.3158 |
0.4706 |
0.6765 |
0.5637 |
0.3843 |
0.3226 |
0.2991 |
Best yesno |
0.5000 |
0.6667 |
- |
0.3333 |
0.4412 |
0.7059 |
0.5564 |
0.4045 |
0.4623 |
0.3886 |
GNN |
0.5385 |
0.7000 |
- |
0.3500 |
0.0294 |
0.2353 |
0.1039 |
- | - | - |
Umass_czi_1 |
0.8077 |
0.8387 |
0.7619 |
0.8003 |
0.3529 |
0.5294 |
0.4186 |
0.5164 |
0.3888 |
0.3774 |
Umass_czi_2 |
0.6923 |
0.7333 |
0.6364 |
0.6848 |
0.3235 |
0.5000 |
0.3946 |
0.5164 |
0.3888 |
0.3774 |
Umass_czi_3 |
0.7692 |
0.8125 |
0.7000 |
0.7563 |
0.3235 |
0.5000 |
0.3931 |
0.5753 |
0.4182 |
0.4146 |
Umass_czi_4 |
0.6538 |
0.7097 |
0.5714 |
0.6406 |
0.2941 |
0.5588 |
0.3946 |
0.5753 |
0.4182 |
0.4146 |
Umass_czi_5 |
0.6538 |
0.7097 |
0.5714 |
0.6406 |
0.5000 |
0.7059 |
0.5637 |
0.5753 |
0.4182 |
0.4146 |
factoid qa model |
0.7308 |
0.7879 |
0.6316 |
0.7097 |
0.4412 |
0.6471 |
0.5206 |
0.4039 |
0.1696 |
0.2117 |
Parameters retrained |
0.7308 |
0.7879 |
0.6316 |
0.7097 |
0.4412 |
0.6765 |
0.5216 |
0.4679 |
0.3443 |
0.3341 |
Features Fusion |
0.7308 |
0.7879 |
0.6316 |
0.7097 |
0.5000 |
0.6765 |
0.5745 |
0.5428 |
0.3541 |
0.3625 |
Multitask SBERT Cls |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
Multitask SBERT reg |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
sbert cls |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
sbert reg |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
sbert 1 epoch cls |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
NCU-IISR_1 |
0.7308 |
0.7742 |
0.6667 |
0.7204 |
0.5000 |
0.6765 |
0.5735 |
0.5539 |
0.3786 |
0.3905 |
dice-a-1.0 |
0.7308 |
0.7879 |
0.6316 |
0.7097 |
0.3824 |
0.6471 |
0.4926 |
0.3627 |
0.1965 |
0.2188 |
FudanLabZhu1 |
0.5769 |
0.7179 |
0.1538 |
0.4359 |
0.4118 |
0.5588 |
0.4804 |
0.4055 |
0.2970 |
0.3004 |
FudanLabZhu2 |
0.5769 |
0.7027 |
0.2667 |
0.4847 |
0.5294 |
0.6765 |
0.5980 |
0.5784 |
0.3541 |
0.3902 |
FudanLabZhu4 |
0.5769 |
0.7027 |
0.2667 |
0.4847 |
0.5000 |
0.6765 |
0.5686 |
0.5375 |
0.5089 |
0.4571 |
FudanLabZhu3 |
0.5769 |
0.7179 |
0.1538 |
0.4359 |
0.4118 |
0.5882 |
0.4755 |
0.5375 |
0.5089 |
0.4571 |
GIAO |
0.6538 |
0.7273 |
0.5263 |
0.6268 |
0.5000 |
0.6765 |
0.5784 |
0.6520 |
0.3585 |
0.4101 |
KoreaUniv-DMIS-1 |
0.7692 |
0.8000 |
0.7273 |
0.7636 |
0.5294 |
0.7059 |
0.6078 |
0.3577 |
0.5539 |
0.4037 |
KoreaUniv-DMIS-2 |
0.7692 |
0.8000 |
0.7273 |
0.7636 |
0.5294 |
0.6765 |
0.5882 |
0.2760 |
0.4926 |
0.3122 |
dice-b-1.0 |
0.7308 |
0.7879 |
0.6316 |
0.7097 |
0.3824 |
0.6471 |
0.4926 |
0.3775 |
0.2259 |
0.2384 |
simple truncation |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
kmeans |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
similarity measures |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
KoreaUniv-DMIS-3 |
0.7692 |
0.7857 |
0.7500 |
0.7679 |
0.5000 |
0.6765 |
0.5706 |
0.3049 |
0.4461 |
0.3346 |
KoreaUniv-DMIS-4 |
0.8077 |
0.8276 |
0.7826 |
0.8051 |
0.5000 |
0.6471 |
0.5613 |
0.3002 |
0.4926 |
0.3318 |
extractive |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
abstractive |
0.5385 |
0.7000 |
- |
0.3500 |
- | - | - |
- | - | - |
FudanLabZhu5 |
0.5769 |
0.7179 |
0.1538 |
0.4359 |
0.5588 |
0.7353 |
0.6284 |
0.5375 |
0.5089 |
0.4571 |
KoreaUniv-DMIS-5 |
0.8462 |
0.8571 |
0.8333 |
0.8452 |
0.4706 |
0.7059 |
0.5686 |
0.3630 |
0.4329 |
0.3355 |
pa |
0.7308 |
0.7879 |
0.6316 |
0.7097 |
0.4706 |
0.5588 |
0.5098 |
0.3571 |
0.3661 |
0.3030 |
pa-base |
0.7308 |
0.7879 |
0.6316 |
0.7097 |
0.4706 |
0.5588 |
0.5098 |
0.3571 |
0.3661 |
0.3030 |
BioASQ_Baseline |
0.5385 |
0.4000 |
0.6250 |
0.5125 |
0.0588 |
0.2059 |
0.1078 |
0.1554 |
0.4519 |
0.2122 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
MQ-2 |
0.5054 |
0.2949 |
0.5122 |
0.2878 |
3.96 |
4.53 |
3.69 |
4.09 |
MQ-3 |
0.5162 |
0.2964 |
0.5220 |
0.2891 |
3.96 |
4.47 |
3.71 |
4.11 |
MQ-4 |
0.4915 |
0.2886 |
0.5015 |
0.2832 |
4.00 |
4.42 |
3.69 |
4.11 |
bio-answerfinder |
0.4025 |
0.2993 |
0.4001 |
0.2938 |
4.19 |
4.18 |
3.97 |
4.71 |
auth-qa-1 |
- |
- |
- |
- |
- |
- |
- |
- |
auth-qa-2 |
- |
- |
- |
- |
- |
- |
- |
- |
auth-qa-3 |
- |
- |
- |
- |
- |
- |
- |
- |
auth-qa-4 |
- |
- |
- |
- |
- |
- |
- |
- |
auth-qa-5 |
- |
- |
- |
- |
- |
- |
- |
- |
MQ-1 |
0.4971 |
0.3088 |
0.5074 |
0.3017 |
3.82 |
4.54 |
3.81 |
4.19 |
MQ-5 |
0.3896 |
0.2674 |
0.4005 |
0.2638 |
4.02 |
4.15 |
3.86 |
4.36 |
BJUTNLPGroup |
0.0288 |
0.0488 |
0.0184 |
0.0316 |
3.00 |
2.69 |
3.37 |
4.58 |
UoT_baseline |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_allquestions |
- |
- |
- |
- |
- |
- |
- |
- |
Best factoid |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_multitask_learn |
- |
- |
- |
- |
- |
- |
- |
- |
Best yesno |
- |
- |
- |
- |
- |
- |
- |
- |
GNN |
0.1543 |
0.1385 |
0.1538 |
0.1330 |
3.33 |
2.86 |
3.28 |
4.41 |
Umass_czi_1 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_2 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_3 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_4 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_5 |
- |
- |
- |
- |
- |
- |
- |
- |
factoid qa model |
- |
- |
- |
- |
- |
- |
- |
- |
Parameters retrained |
- |
- |
- |
- |
- |
- |
- |
- |
Features Fusion |
- |
- |
- |
- |
- |
- |
- |
- |
Multitask SBERT Cls |
0.4195 |
0.2492 |
0.4282 |
0.2453 |
3.92 |
4.46 |
3.78 |
4.14 |
Multitask SBERT reg |
0.4301 |
0.2582 |
0.4414 |
0.2538 |
3.89 |
4.32 |
3.68 |
4.10 |
sbert cls |
0.4294 |
0.2575 |
0.4382 |
0.2534 |
3.91 |
4.44 |
3.77 |
4.13 |
sbert reg |
0.4294 |
0.2575 |
0.4382 |
0.2534 |
3.91 |
4.44 |
3.77 |
4.13 |
sbert 1 epoch cls |
0.3996 |
0.2502 |
0.4147 |
0.2482 |
3.91 |
4.37 |
3.74 |
4.11 |
NCU-IISR_1 |
0.1616 |
0.1845 |
0.1587 |
0.1796 |
3.92 |
3.33 |
3.79 |
4.70 |
dice-a-1.0 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu1 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu2 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu4 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu3 |
- |
- |
- |
- |
- |
- |
- |
- |
GIAO |
- |
- |
- |
- |
- |
- |
- |
- |
KoreaUniv-DMIS-1 |
0.2390 |
0.2136 |
0.2436 |
0.2141 |
4.50 |
3.67 |
3.86 |
4.76 |
KoreaUniv-DMIS-2 |
0.2423 |
0.2274 |
0.2456 |
0.2287 |
4.55 |
3.86 |
4.05 |
4.79 |
dice-b-1.0 |
- |
- |
- |
- |
- |
- |
- |
- |
simple truncation |
0.4793 |
0.2828 |
0.4929 |
0.2782 |
3.99 |
4.50 |
3.78 |
4.03 |
kmeans |
0.4793 |
0.2828 |
0.4929 |
0.2782 |
3.99 |
4.50 |
3.78 |
4.03 |
similarity measures |
0.3862 |
0.2446 |
0.3971 |
0.2394 |
3.97 |
4.19 |
3.82 |
4.27 |
KoreaUniv-DMIS-3 |
0.2407 |
0.2246 |
0.2424 |
0.2251 |
4.66 |
3.51 |
3.90 |
4.86 |
KoreaUniv-DMIS-4 |
0.2268 |
0.2061 |
0.2291 |
0.2066 |
4.56 |
3.84 |
3.84 |
4.75 |
extractive |
0.4668 |
0.2870 |
0.4729 |
0.2796 |
3.91 |
4.46 |
3.72 |
4.15 |
abstractive |
0.1938 |
0.2080 |
0.1948 |
0.2050 |
3.46 |
3.09 |
3.62 |
4.51 |
FudanLabZhu5 |
- |
- |
- |
- |
- |
- |
- |
- |
KoreaUniv-DMIS-5 |
0.2302 |
0.2126 |
0.2384 |
0.2181 |
4.54 |
3.87 |
3.98 |
4.81 |
pa |
0.3135 |
0.2904 |
0.3138 |
0.2874 |
3.79 |
3.70 |
3.86 |
4.69 |
pa-base |
0.5291 |
0.2923 |
0.5321 |
0.2856 |
3.91 |
4.40 |
3.75 |
4.02 |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |
Test batch 5
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
MQ-2 |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
MQ-3 |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
MQ-4 |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
GNN |
0.5588 |
0.7170 |
- |
0.3585 |
0.0000 |
0.2813 |
0.0677 |
- | - | - |
zmodel2 |
0.5588 |
0.7170 |
- |
0.3585 |
0.0000 |
0.3125 |
0.0995 |
- | - | - |
NCU-IISR_1 |
0.7353 |
0.7429 |
0.7273 |
0.7351 |
0.4688 |
0.7188 |
0.5859 |
0.4514 |
0.2659 |
0.3140 |
NCU-IISR_2 |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
auth-qa-1 |
0.6765 |
0.6857 |
0.6667 |
0.6762 |
0.2500 |
0.3750 |
0.2995 |
0.1750 |
0.4821 |
0.2386 |
auth-qa-5 |
0.6471 |
0.7143 |
0.5385 |
0.6264 |
0.2500 |
0.3750 |
0.2995 |
0.1750 |
0.4821 |
0.2386 |
UoT_baseline |
0.6176 |
0.5185 |
0.6829 |
0.6007 |
0.5000 |
0.6875 |
0.5844 |
0.2242 |
0.1577 |
0.1732 |
UoT_multitask_learn |
0.5000 |
0.6667 |
- |
0.3333 |
0.4063 |
0.7188 |
0.5365 |
0.5938 |
0.3700 |
0.4296 |
UoT_allquestions |
0.5588 |
0.7170 |
- |
0.3585 |
0.4063 |
0.6875 |
0.5063 |
0.3854 |
0.2798 |
0.3082 |
Best factoid |
0.5588 |
0.7170 |
- |
0.3585 |
0.4063 |
0.6563 |
0.5026 |
0.5174 |
0.3631 |
0.4002 |
Best yesno |
0.5588 |
0.7170 |
- |
0.3585 |
0.4063 |
0.6875 |
0.5063 |
0.3854 |
0.2798 |
0.3082 |
bio-answerfinder |
0.7353 |
0.7568 |
0.7097 |
0.7332 |
0.5000 |
0.5313 |
0.5156 |
0.4745 |
0.4325 |
0.4163 |
MQ-1 |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
MQ-5 |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
BJUTNLP2 |
0.5588 |
0.7170 |
- |
0.3585 |
0.4688 |
0.5625 |
0.5026 |
0.1250 |
0.3790 |
0.1731 |
Umass_czi_1 |
0.6471 |
0.6471 |
0.6471 |
0.6471 |
0.4688 |
0.7188 |
0.5677 |
0.5139 |
0.2808 |
0.3353 |
Umass_czi_2 |
0.6176 |
0.6286 |
0.6061 |
0.6173 |
0.5000 |
0.6250 |
0.5417 |
0.5139 |
0.2808 |
0.3353 |
Umass_czi_3 |
0.7941 |
0.8205 |
0.7586 |
0.7896 |
0.5625 |
0.7188 |
0.6354 |
0.1528 |
0.1230 |
0.1310 |
Umass_czi_4 |
0.7353 |
0.7692 |
0.6897 |
0.7294 |
0.5313 |
0.6563 |
0.5833 |
0.3750 |
0.1756 |
0.2166 |
Umass_czi_5 |
0.5882 |
0.6818 |
0.4167 |
0.5492 |
0.4688 |
0.7188 |
0.5604 |
0.5972 |
0.3224 |
0.3909 |
factoid qa model |
0.7647 |
0.8095 |
0.6923 |
0.7509 |
0.4688 |
0.6563 |
0.5401 |
0.3333 |
0.0923 |
0.1387 |
Parameters retrained |
0.7647 |
0.8095 |
0.6923 |
0.7509 |
0.4688 |
0.7813 |
0.5938 |
0.5139 |
0.2946 |
0.3492 |
Features Fusion |
0.7647 |
0.8095 |
0.6923 |
0.7509 |
0.5313 |
0.7500 |
0.6115 |
0.5035 |
0.2808 |
0.3298 |
BioFusion |
0.7647 |
0.8095 |
0.6923 |
0.7509 |
0.5000 |
0.7188 |
0.5818 |
0.3854 |
0.2589 |
0.2907 |
BioLabel |
0.7647 |
0.8095 |
0.6923 |
0.7509 |
0.4688 |
0.7188 |
0.5573 |
0.4271 |
0.2798 |
0.3185 |
dice-a-1.0 |
0.6765 |
0.7179 |
0.6207 |
0.6693 |
0.4375 |
0.6250 |
0.5156 |
0.3750 |
0.1696 |
0.2159 |
dice-b-1.0 |
0.7941 |
0.8108 |
0.7742 |
0.7925 |
0.5313 |
0.6875 |
0.5885 |
0.4028 |
0.1845 |
0.2313 |
DAIICT_lex_UMLSgraph |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
DAIICT_QSM_UMLSgraph |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
DAIICT_QSM |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
DAIICT_lex |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
system of teamdaiict |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
FudanLabZhu1 |
0.5000 |
0.5405 |
0.4516 |
0.4961 |
0.4375 |
0.6250 |
0.5036 |
0.4694 |
0.3304 |
0.3641 |
FudanLabZhu2 |
0.6176 |
0.6977 |
0.4800 |
0.5888 |
0.5000 |
0.7188 |
0.5818 |
0.6458 |
0.4028 |
0.4703 |
FudanLabZhu4 |
0.6176 |
0.6977 |
0.4800 |
0.5888 |
0.1250 |
0.2813 |
0.1901 |
0.6458 |
0.4028 |
0.4703 |
FudanLabZhu3 |
0.6176 |
0.6977 |
0.4800 |
0.5888 |
0.4688 |
0.6250 |
0.5313 |
0.5125 |
0.3869 |
0.4092 |
KoreaUniv-DMIS-1 |
0.8235 |
0.8333 |
0.8125 |
0.8229 |
0.4688 |
0.6250 |
0.5208 |
0.5799 |
0.4812 |
0.5050 |
Multitask SBERT Cls |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
KoreaUniv-DMIS-2 |
0.8235 |
0.8333 |
0.8125 |
0.8229 |
0.5000 |
0.7188 |
0.5833 |
0.5799 |
0.4812 |
0.5050 |
Multitask SBERT reg |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
KoreaUniv-DMIS-3 |
0.8235 |
0.8333 |
0.8125 |
0.8229 |
0.4688 |
0.7188 |
0.5661 |
0.5694 |
0.5437 |
0.5222 |
KoreaUniv-DMIS-4 |
0.7941 |
0.8000 |
0.7879 |
0.7939 |
0.5313 |
0.7188 |
0.6120 |
0.5724 |
0.5437 |
0.5247 |
KoreaUniv-DMIS-5 |
0.8529 |
0.8649 |
0.8387 |
0.8518 |
0.5000 |
0.6875 |
0.5677 |
0.5465 |
0.5645 |
0.5243 |
sbert cls |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
sbert 1 epoch cls |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
sbert reg |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
FudanLabZhu5 |
0.6176 |
0.6977 |
0.4800 |
0.5888 |
0.5000 |
0.6563 |
0.5677 |
0.5317 |
0.3879 |
0.4239 |
dice-c-1.0 |
0.8529 |
0.8571 |
0.8485 |
0.8528 |
0.3125 |
0.5625 |
0.4151 |
0.1250 |
0.0536 |
0.0741 |
dice-d-1.0 |
0.8529 |
0.8571 |
0.8485 |
0.8528 |
0.3125 |
0.5625 |
0.4151 |
0.1250 |
0.0536 |
0.0741 |
NCU-IISR_3 |
0.5588 |
0.7170 |
- |
0.3585 |
- | - | - |
- | - | - |
dice-e-1.0 |
0.8529 |
0.8571 |
0.8485 |
0.8528 |
0.3125 |
0.5625 |
0.4151 |
0.1250 |
0.0536 |
0.0741 |
pa |
0.8235 |
0.8333 |
0.8125 |
0.8229 |
0.4375 |
0.6250 |
0.5260 |
0.3284 |
0.2679 |
0.2761 |
GIAO |
0.5588 |
0.7170 |
- |
0.3585 |
0.5000 |
0.6563 |
0.5677 |
- | - | - |
BioASQ_Baseline |
0.6176 |
0.5185 |
0.6829 |
0.6007 |
0.1563 |
0.3438 |
0.2266 |
0.2573 |
0.3641 |
0.2581 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
MQ-2 |
0.5105 |
0.3246 |
0.5161 |
0.3171 |
3.88 |
4.47 |
3.69 |
4.01 |
MQ-3 |
0.5188 |
0.3328 |
0.5214 |
0.3241 |
3.92 |
4.42 |
3.71 |
4.02 |
MQ-4 |
0.5155 |
0.3311 |
0.5188 |
0.3233 |
3.91 |
4.43 |
3.65 |
4.02 |
GNN |
0.2192 |
0.1987 |
0.2151 |
0.1889 |
3.33 |
3.12 |
3.42 |
4.57 |
zmodel2 |
0.1882 |
0.1675 |
0.1855 |
0.1573 |
3.25 |
3.07 |
3.32 |
4.54 |
NCU-IISR_1 |
0.1634 |
0.1793 |
0.1552 |
0.1692 |
3.64 |
3.14 |
3.55 |
4.71 |
NCU-IISR_2 |
0.3027 |
0.2842 |
0.3019 |
0.2760 |
4.22 |
4.12 |
4.27 |
4.93 |
auth-qa-1 |
- |
- |
- |
- |
- |
- |
- |
- |
auth-qa-5 |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_baseline |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_multitask_learn |
- |
- |
- |
- |
- |
- |
- |
- |
UoT_allquestions |
- |
- |
- |
- |
- |
- |
- |
- |
Best factoid |
- |
- |
- |
- |
- |
- |
- |
- |
Best yesno |
- |
- |
- |
- |
- |
- |
- |
- |
bio-answerfinder |
0.4057 |
0.2971 |
0.4021 |
0.2892 |
3.99 |
4.10 |
3.82 |
4.59 |
MQ-1 |
0.5050 |
0.3154 |
0.5129 |
0.3074 |
3.92 |
4.48 |
3.71 |
4.01 |
MQ-5 |
0.4069 |
0.3094 |
0.4151 |
0.3051 |
3.93 |
4.26 |
3.80 |
4.37 |
BJUTNLP2 |
0.0373 |
0.0608 |
0.0244 |
0.0413 |
3.22 |
2.47 |
3.21 |
4.78 |
Umass_czi_1 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_2 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_3 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_4 |
- |
- |
- |
- |
- |
- |
- |
- |
Umass_czi_5 |
- |
- |
- |
- |
- |
- |
- |
- |
factoid qa model |
- |
- |
- |
- |
- |
- |
- |
- |
Parameters retrained |
- |
- |
- |
- |
- |
- |
- |
- |
Features Fusion |
- |
- |
- |
- |
- |
- |
- |
- |
BioFusion |
- |
- |
- |
- |
- |
- |
- |
- |
BioLabel |
- |
- |
- |
- |
- |
- |
- |
- |
dice-a-1.0 |
- |
- |
- |
- |
- |
- |
- |
- |
dice-b-1.0 |
- |
- |
- |
- |
- |
- |
- |
- |
DAIICT_lex_UMLSgraph |
0.6250 |
0.3189 |
0.6234 |
0.3059 |
3.75 |
4.76 |
3.57 |
3.73 |
DAIICT_QSM_UMLSgraph |
0.6250 |
0.3189 |
0.6234 |
0.3059 |
3.75 |
4.76 |
3.57 |
3.73 |
DAIICT_QSM |
0.6428 |
0.3284 |
0.6392 |
0.3138 |
3.70 |
4.83 |
3.57 |
3.68 |
DAIICT_lex |
0.6257 |
0.3205 |
0.6239 |
0.3073 |
3.75 |
4.74 |
3.57 |
3.72 |
system of teamdaiict |
0.6473 |
0.3245 |
0.6445 |
0.3100 |
3.67 |
4.81 |
3.59 |
3.77 |
FudanLabZhu1 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu2 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu4 |
- |
- |
- |
- |
- |
- |
- |
- |
FudanLabZhu3 |
- |
- |
- |
- |
- |
- |
- |
- |
KoreaUniv-DMIS-1 |
0.2374 |
0.2136 |
0.2446 |
0.2158 |
4.24 |
3.43 |
3.71 |
4.65 |
Multitask SBERT Cls |
0.4946 |
0.3181 |
0.4937 |
0.3085 |
4.01 |
4.41 |
3.84 |
4.09 |
KoreaUniv-DMIS-2 |
0.2637 |
0.2608 |
0.2630 |
0.2565 |
4.45 |
4.11 |
4.14 |
4.81 |
Multitask SBERT reg |
0.4860 |
0.3083 |
0.4854 |
0.2992 |
3.99 |
4.42 |
3.82 |
4.09 |
KoreaUniv-DMIS-3 |
0.2244 |
0.2123 |
0.2282 |
0.2115 |
4.28 |
3.56 |
3.65 |
4.67 |
KoreaUniv-DMIS-4 |
0.2341 |
0.2230 |
0.2396 |
0.2260 |
4.43 |
4.01 |
3.94 |
4.74 |
KoreaUniv-DMIS-5 |
0.2369 |
0.2416 |
0.2350 |
0.2378 |
4.41 |
4.14 |
4.05 |
4.74 |
sbert cls |
0.4948 |
0.3172 |
0.4926 |
0.3070 |
4.01 |
4.43 |
3.84 |
4.06 |
sbert 1 epoch cls |
0.4967 |
0.3195 |
0.4962 |
0.3098 |
4.01 |
4.33 |
3.78 |
4.11 |
sbert reg |
0.4896 |
0.3136 |
0.4886 |
0.3041 |
4.01 |
4.42 |
3.83 |
4.09 |
FudanLabZhu5 |
- |
- |
- |
- |
- |
- |
- |
- |
dice-c-1.0 |
- |
- |
- |
- |
- |
- |
- |
- |
dice-d-1.0 |
- |
- |
- |
- |
- |
- |
- |
- |
NCU-IISR_3 |
0.3698 |
0.3538 |
0.3603 |
0.3396 |
4.33 |
4.12 |
4.21 |
4.95 |
dice-e-1.0 |
- |
- |
- |
- |
- |
- |
- |
- |
pa |
0.3673 |
0.2963 |
0.3634 |
0.2860 |
3.97 |
3.75 |
3.87 |
4.52 |
GIAO |
- |
- |
- |
- |
- |
- |
- |
- |
BioASQ_Baseline |
- |
- |
- |
- |
- |
- |
- |
- |