BioASQ Participants Area
Task 1b: Test Results of Phase B
The test results are presented in seperate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task 1B are presented
here .
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
Wishart-S1 |
0.9200 |
0.2222 |
0.2778 |
0.2778 |
0.3186 |
0.2147 |
0.2290 |
Wishart-S2 |
0.9200 |
0.2222 |
0.2778 |
0.2778 |
0.3186 |
0.2147 |
0.2290 |
Wishart-S3 |
0.9200 |
0.2222 |
0.3333 |
0.3056 |
0.3067 |
0.2082 |
0.2207 |
main system |
0.3200 |
0.0000 |
0.1111 |
0.0296 |
0.0037 |
0.0394 |
0.0066 |
BioASQ_Baseline |
0.4400 |
0.0000 |
0.2222 |
0.1056 |
0.0153 |
0.0402 |
0.0204 |
BioASQ Baseline 2 |
0.4800 |
0.0000 |
0.2222 |
0.1056 |
0.0153 |
0.0402 |
0.0204 |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
Wishart-S1 |
0.2059 |
0.2202 |
3.97 |
3.71 |
3.83 |
4.27 |
Wishart-S2 |
0.2059 |
0.2202 |
3.97 |
3.71 |
3.83 |
4.27 |
Wishart-S3 |
0.2059 |
0.2202 |
3.97 |
3.71 |
3.83 |
4.27 |
main system |
0.2165 |
0.2396 |
2.95 |
3.71 |
3.10 |
3.65 |
BioASQ_Baseline |
0.2266 |
0.2636 |
2.55 |
3.15 |
2.54 |
3.21 |
BioASQ Baseline 2 |
0.1935 |
0.2305 |
2.39 |
3.19 |
2.37 |
2.98 |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
main system |
0.4231 |
0.0000 |
0.1000 |
0.0292 |
0.0603 |
0.1040 |
0.0680 |
Wishart-S1 |
0.9615 |
0.2500 |
0.3000 |
0.3000 |
0.4060 |
0.3127 |
0.3336 |
system 2 |
0.4231 |
0.0000 |
0.1000 |
0.0292 |
0.0437 |
0.0445 |
0.0414 |
system 3 |
0.4231 |
0.0000 |
0.1000 |
0.0417 |
0.0644 |
0.0440 |
0.0488 |
system 4 |
0.4231 |
0.0000 |
0.1000 |
0.0417 |
0.0644 |
0.0440 |
0.0488 |
BioASQ_Baseline |
0.2692 |
0.0000 |
0.2500 |
0.0725 |
0.0612 |
0.2062 |
0.0789 |
BioASQ Baseline 2 |
0.5000 |
0.0000 |
0.2500 |
0.0725 |
0.0612 |
0.2062 |
0.0789 |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
main system |
0.2122 |
0.2596 |
2.99 |
3.86 |
3.16 |
3.58 |
Wishart-S1 |
0.2106 |
0.2387 |
4.14 |
4.14 |
4.17 |
4.48 |
system 2 |
0.2204 |
0.2659 |
2.92 |
3.87 |
3.08 |
3.50 |
system 3 |
0.2152 |
0.2592 |
2.96 |
3.82 |
3.06 |
3.52 |
system 4 |
0.2152 |
0.2592 |
2.96 |
3.82 |
3.06 |
3.52 |
BioASQ_Baseline |
0.2074 |
0.2533 |
2.88 |
3.40 |
2.65 |
3.15 |
BioASQ Baseline 2 |
0.2052 |
0.2577 |
2.74 |
3.03 |
2.45 |
3.27 |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
main system |
0.5769 |
0.0000 |
0.1250 |
0.0281 |
0.0671 |
0.1013 |
0.0727 |
system 2 |
0.5769 |
0.0000 |
0.1250 |
0.0281 |
0.0642 |
0.0580 |
0.0575 |
system 3 |
0.5769 |
0.0000 |
0.1250 |
0.0281 |
0.0594 |
0.0399 |
0.0450 |
BioASQ_Baseline |
0.6538 |
0.0000 |
0.1250 |
0.0391 |
0.0209 |
0.0635 |
0.0278 |
BioASQ Baseline 2 |
0.6154 |
0.0000 |
0.1250 |
0.0391 |
0.0209 |
0.0635 |
0.0278 |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
main system |
0.2563 |
0.2828 |
2.79 |
3.33 |
3.08 |
3.33 |
system 2 |
0.2505 |
0.2806 |
2.79 |
3.33 |
2.95 |
3.23 |
system 3 |
0.2555 |
0.2835 |
2.74 |
3.13 |
3.08 |
3.00 |
BioASQ_Baseline |
0.2670 |
0.2982 |
3.06 |
3.50 |
2.81 |
3.41 |
BioASQ Baseline 2 |
0.2547 |
0.2941 |
2.87 |
3.65 |
2.96 |
3.22 |