BioASQ Participants Area
Task 1b: Test Results of Phase B
The test results are presented in seperate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task 1B are presented
here .
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
Wishart-S1 |
0.9200 |
0.2222 |
0.2778 |
0.2778 |
0.3186 |
0.2147 |
0.2290 |
Wishart-S2 |
0.9200 |
0.2222 |
0.2778 |
0.2778 |
0.3186 |
0.2147 |
0.2290 |
Wishart-S3 |
0.9200 |
0.2222 |
0.3333 |
0.3056 |
0.3067 |
0.2082 |
0.2207 |
main system |
0.3200 |
0.0000 |
0.1111 |
0.0296 |
0.0037 |
0.0394 |
0.0066 |
BioASQ_Baseline |
0.4400 |
0.0000 |
0.2222 |
0.1056 |
0.0153 |
0.0402 |
0.0204 |
BioASQ Baseline FS |
0.4800 |
0.0000 |
0.2222 |
0.1056 |
0.0153 |
0.0402 |
0.0204 |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
Wishart-S1 |
0.2059 |
0.2202 |
3.97 |
3.71 |
3.83 |
4.27 |
Wishart-S2 |
0.2059 |
0.2202 |
3.97 |
3.71 |
3.83 |
4.27 |
Wishart-S3 |
0.2059 |
0.2202 |
3.97 |
3.71 |
3.83 |
4.27 |
main system |
0.2165 |
0.2396 |
2.95 |
3.71 |
3.10 |
3.65 |
BioASQ_Baseline |
0.2266 |
0.2636 |
2.55 |
3.15 |
2.54 |
3.21 |
BioASQ Baseline FS |
0.1935 |
0.2305 |
2.39 |
3.19 |
2.37 |
2.98 |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
main system |
0.4231 |
0.0000 |
0.1000 |
0.0292 |
0.0603 |
0.1040 |
0.0680 |
Wishart-S1 |
0.9615 |
0.2500 |
0.3000 |
0.3000 |
0.4060 |
0.3127 |
0.3336 |
system 2 |
0.4231 |
0.0000 |
0.1000 |
0.0292 |
0.0437 |
0.0445 |
0.0414 |
system 3 |
0.4231 |
0.0000 |
0.1000 |
0.0417 |
0.0644 |
0.0440 |
0.0488 |
system 4 |
0.4231 |
0.0000 |
0.1000 |
0.0417 |
0.0644 |
0.0440 |
0.0488 |
BioASQ_Baseline |
0.2692 |
0.0000 |
0.2500 |
0.0725 |
0.0612 |
0.2062 |
0.0789 |
BioASQ Baseline FS |
0.5000 |
0.0000 |
0.2500 |
0.0725 |
0.0612 |
0.2062 |
0.0789 |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
main system |
0.2122 |
0.2596 |
2.99 |
3.86 |
3.16 |
3.58 |
Wishart-S1 |
0.2106 |
0.2387 |
4.14 |
4.14 |
4.17 |
4.48 |
system 2 |
0.2204 |
0.2659 |
2.92 |
3.87 |
3.08 |
3.50 |
system 3 |
0.2152 |
0.2592 |
2.96 |
3.82 |
3.06 |
3.52 |
system 4 |
0.2152 |
0.2592 |
2.96 |
3.82 |
3.06 |
3.52 |
BioASQ_Baseline |
0.2074 |
0.2533 |
2.88 |
3.40 |
2.65 |
3.15 |
BioASQ Baseline FS |
0.2052 |
0.2577 |
2.74 |
3.03 |
2.45 |
3.27 |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System Name |
Accuracy |
Strict Acc. |
Lenient Acc. |
MRR |
Mean precision |
Recall |
F-Measure |
main system |
0.5769 |
0.0000 |
0.1250 |
0.0281 |
0.0671 |
0.1013 |
0.0727 |
system 2 |
0.5769 |
0.0000 |
0.1250 |
0.0281 |
0.0642 |
0.0580 |
0.0575 |
system 3 |
0.5769 |
0.0000 |
0.1250 |
0.0281 |
0.0594 |
0.0399 |
0.0450 |
BioASQ_Baseline |
0.6538 |
0.0000 |
0.1250 |
0.0391 |
0.0209 |
0.0635 |
0.0278 |
BioASQ Baseline FS |
0.6154 |
0.0000 |
0.1250 |
0.0391 |
0.0209 |
0.0635 |
0.0278 |
Ideal Answers
|
Automatic scores |
Manual scores |
System Name |
Rouge-2 |
Rouge-SU4 |
Readability |
Recall |
Precision |
Repetition |
main system |
0.2563 |
0.2828 |
2.79 |
3.33 |
3.08 |
3.33 |
system 2 |
0.2505 |
0.2806 |
2.79 |
3.33 |
2.95 |
3.23 |
system 3 |
0.2555 |
0.2835 |
2.74 |
3.13 |
3.08 |
3.00 |
BioASQ_Baseline |
0.2670 |
0.2982 |
3.06 |
3.50 |
2.81 |
3.41 |
BioASQ Baseline FS |
0.2547 |
0.2941 |
2.87 |
3.65 |
2.96 |
3.22 |