BioASQ Participants Area
Task 12b: Test Results of Phase B
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task B are presented
here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
mibi_rag_abstract |
0.8000 |
0.8276 |
0.7619 |
0.7947 |
- | - | - |
0.5048 |
0.4149 |
0.4371 |
mibi_rag_snippet |
0.8800 |
0.8966 |
0.8571 |
0.8768 |
- | - | - |
0.5190 |
0.4503 |
0.4680 |
UR-IW-1 |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.1905 |
0.3333 |
0.2540 |
0.4840 |
0.5069 |
0.4662 |
UR-IW-4 |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.1905 |
0.1905 |
0.1905 |
0.4563 |
0.3903 |
0.4015 |
UR-IW-2 |
0.9200 |
0.9231 |
0.9167 |
0.9199 |
0.2381 |
0.2381 |
0.2381 |
0.5202 |
0.4947 |
0.4992 |
UR-IW-5 |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.2381 |
0.2857 |
0.2540 |
0.6054 |
0.5942 |
0.5790 |
UR-IW-3 |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.2381 |
0.2381 |
0.2381 |
0.6010 |
0.5799 |
0.5656 |
Gatech competition |
0.8800 |
0.8889 |
0.8696 |
0.8792 |
0.1905 |
0.1905 |
0.1905 |
0.4871 |
0.4051 |
0.3975 |
Mistral-7B finetune |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.1905 |
0.1905 |
0.1905 |
0.5786 |
0.5499 |
0.5571 |
Synthia with first |
0.8800 |
0.8889 |
0.8696 |
0.8792 |
0.2381 |
0.2381 |
0.2381 |
0.4341 |
0.4543 |
0.4067 |
LLM4SciLit |
0.4400 |
- |
0.6111 |
0.3056 |
- | - | - |
- | - | - |
RMC_append_snippets |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.1905 |
0.1905 |
0.1905 |
0.4770 |
0.5343 |
0.4754 |
bioinfo-0 |
0.5600 |
0.7179 |
- |
0.3590 |
- | - | - |
- | - | - |
bioinfo-1 |
0.5600 |
0.7179 |
- |
0.3590 |
- | - | - |
- | - | - |
bioinfo-2 |
0.5600 |
0.7179 |
- |
0.3590 |
- | - | - |
- | - | - |
bioinfo-3 |
0.0800 |
0.0800 |
0.0800 |
0.0800 |
0.1429 |
0.1429 |
0.1429 |
0.3214 |
0.3103 |
0.3093 |
bioinfo-4 |
0.0400 |
- |
0.0769 |
0.0385 |
0.2381 |
0.2381 |
0.2381 |
0.3991 |
0.3587 |
0.3717 |
Fleming-1 |
0.8000 |
0.8387 |
0.7368 |
0.7878 |
0.0476 |
0.0952 |
0.0714 |
0.5196 |
0.4639 |
0.4717 |
dmiip2024_2 |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.2381 |
0.2381 |
0.2381 |
0.3943 |
0.6334 |
0.4483 |
dmiip2024_3 |
0.9200 |
0.9333 |
0.9000 |
0.9167 |
0.2857 |
0.3333 |
0.3095 |
0.5942 |
0.5201 |
0.5368 |
dmiip2024_4 |
0.4400 |
- |
0.6111 |
0.3056 |
0.2857 |
0.5238 |
0.3968 |
0.4481 |
0.5560 |
0.4687 |
dmiip2024_1 |
0.8800 |
0.8889 |
0.8696 |
0.8792 |
0.2857 |
0.4286 |
0.3571 |
0.6647 |
0.5804 |
0.5843 |
dmiip2024 |
0.8800 |
0.8889 |
0.8696 |
0.8792 |
0.2381 |
0.4286 |
0.3175 |
0.6603 |
0.5760 |
0.5797 |
IISR 5th submit |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.2381 |
0.2381 |
0.2381 |
0.5466 |
0.5309 |
0.5301 |
RAG for medicine |
0.7600 |
0.7500 |
0.7692 |
0.7596 |
0.1429 |
0.2381 |
0.1762 |
0.5524 |
0.6009 |
0.5468 |
IBE-LM ver1 |
0.5600 |
0.7179 |
- |
0.3590 |
0.3333 |
0.4286 |
0.3730 |
- | - | - |
IBE-LM ver3 |
0.5600 |
0.7179 |
- |
0.3590 |
0.4286 |
0.4286 |
0.4286 |
- | - | - |
IBE-LM ver 5 |
0.5600 |
0.7179 |
- |
0.3590 |
0.4286 |
0.4286 |
0.4286 |
- | - | - |
IBE-LM ver2 |
0.5600 |
0.7179 |
- |
0.3590 |
0.2857 |
0.4286 |
0.3333 |
0.1143 |
0.2083 |
0.1417 |
IBE-LM ver4 |
0.5600 |
0.7179 |
- |
0.3590 |
0.2857 |
0.4286 |
0.3333 |
0.1143 |
0.2083 |
0.1417 |
IISR 2nd submit |
0.9200 |
0.9286 |
0.9091 |
0.9188 |
0.2381 |
0.2381 |
0.2381 |
0.5813 |
0.5066 |
0.5244 |
IISR 3rd submit |
0.8800 |
0.8966 |
0.8571 |
0.8768 |
0.2381 |
0.2381 |
0.2381 |
0.5417 |
0.5300 |
0.5218 |
IISR 4th submit |
0.9600 |
0.9655 |
0.9524 |
0.9589 |
0.1905 |
0.1905 |
0.1905 |
0.5449 |
0.5186 |
0.5208 |
IISR first submit |
0.8800 |
0.8889 |
0.8696 |
0.8792 |
0.2381 |
0.2857 |
0.2540 |
0.6317 |
0.5426 |
0.5692 |
CPS |
0.6800 |
0.7333 |
0.6000 |
0.6667 |
0.2381 |
0.2381 |
0.2381 |
0.3532 |
0.2566 |
0.2836 |
lasige-ku |
0.6400 |
0.7568 |
0.3077 |
0.5322 |
- | - | - |
- | - | - |
extractive |
0.8000 |
0.8485 |
0.7059 |
0.7772 |
0.0952 |
0.1429 |
0.1190 |
0.1733 |
0.2046 |
0.1760 |
AUEB-System1 |
0.7600 |
0.8235 |
0.6250 |
0.7243 |
0.2857 |
0.3810 |
0.3214 |
0.4286 |
0.3595 |
0.3555 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
mibi_rag_abstract |
0.3138 |
0.1756 |
0.3297 |
0.1722 |
- |
- |
- |
- |
mibi_rag_snippet |
0.3356 |
0.2089 |
0.3564 |
0.2019 |
- |
- |
- |
- |
UR-IW-1 |
0.3218 |
0.2080 |
0.3396 |
0.1966 |
- |
- |
- |
- |
UR-IW-4 |
0.3750 |
0.2388 |
0.3721 |
0.2278 |
- |
- |
- |
- |
UR-IW-2 |
0.3354 |
0.3261 |
0.3284 |
0.3158 |
- |
- |
- |
- |
UR-IW-5 |
0.3571 |
0.1949 |
0.3822 |
0.1904 |
- |
- |
- |
- |
UR-IW-3 |
0.3231 |
0.3029 |
0.3312 |
0.3049 |
- |
- |
- |
- |
Gatech competition |
0.2562 |
0.2065 |
0.2587 |
0.1995 |
- |
- |
- |
- |
Mistral-7B finetune |
0.3288 |
0.3179 |
0.3353 |
0.3168 |
- |
- |
- |
- |
Synthia with first |
0.2661 |
0.1965 |
0.2710 |
0.1915 |
- |
- |
- |
- |
LLM4SciLit |
0.0535 |
0.0712 |
0.0480 |
0.0653 |
- |
- |
- |
- |
RMC_append_snippets |
0.3562 |
0.2509 |
0.3542 |
0.2386 |
- |
- |
- |
- |
bioinfo-0 |
0.3639 |
0.0937 |
0.3863 |
0.0896 |
- |
- |
- |
- |
bioinfo-1 |
0.3628 |
0.0868 |
0.3871 |
0.0845 |
- |
- |
- |
- |
bioinfo-2 |
0.3888 |
0.0984 |
0.4119 |
0.0947 |
- |
- |
- |
- |
bioinfo-3 |
0.2722 |
0.0893 |
0.2897 |
0.0884 |
- |
- |
- |
- |
bioinfo-4 |
0.3458 |
0.0900 |
0.3667 |
0.0865 |
- |
- |
- |
- |
Fleming-1 |
0.2691 |
0.1159 |
0.3098 |
0.1166 |
- |
- |
- |
- |
dmiip2024_2 |
0.2969 |
0.2539 |
0.3060 |
0.2479 |
- |
- |
- |
- |
dmiip2024_3 |
0.2026 |
0.1951 |
0.2116 |
0.1901 |
- |
- |
- |
- |
dmiip2024_4 |
0.2614 |
0.2507 |
0.2576 |
0.2457 |
- |
- |
- |
- |
dmiip2024_1 |
0.2757 |
0.2659 |
0.2776 |
0.2580 |
- |
- |
- |
- |
dmiip2024 |
0.2589 |
0.2546 |
0.2558 |
0.2439 |
- |
- |
- |
- |
IISR 5th submit |
0.3750 |
0.1442 |
0.3924 |
0.1369 |
- |
- |
- |
- |
RAG for medicine |
0.2800 |
0.1165 |
0.3092 |
0.1159 |
- |
- |
- |
- |
IBE-LM ver1 |
- |
- |
- |
- |
- |
- |
- |
- |
IBE-LM ver3 |
- |
- |
- |
- |
- |
- |
- |
- |
IBE-LM ver 5 |
- |
- |
- |
- |
- |
- |
- |
- |
IBE-LM ver2 |
- |
- |
- |
- |
- |
- |
- |
- |
IBE-LM ver4 |
- |
- |
- |
- |
- |
- |
- |
- |
IISR 2nd submit |
0.2625 |
0.1980 |
0.2660 |
0.1907 |
- |
- |
- |
- |
IISR 3rd submit |
0.3379 |
0.1366 |
0.3400 |
0.1270 |
- |
- |
- |
- |
IISR 4th submit |
0.3675 |
0.1289 |
0.3774 |
0.1197 |
- |
- |
- |
- |
IISR first submit |
0.2796 |
0.2339 |
0.2826 |
0.2292 |
- |
- |
- |
- |
CPS |
0.2873 |
0.2090 |
0.2936 |
0.2017 |
- |
- |
- |
- |
lasige-ku |
0.0516 |
0.0331 |
0.0808 |
0.0432 |
- |
- |
- |
- |
extractive |
0.2167 |
0.2163 |
0.2261 |
0.2220 |
- |
- |
- |
- |
AUEB-System1 |
- |
- |
- |
- |
- |
- |
- |
- |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
dmiip2024 |
0.8846 |
0.9032 |
0.8571 |
0.8802 |
0.4737 |
0.6842 |
0.5614 |
0.4648 |
0.4619 |
0.4568 |
dmiip2024_1 |
0.8846 |
0.9032 |
0.8571 |
0.8802 |
0.5263 |
0.6842 |
0.5965 |
0.4787 |
0.4730 |
0.4683 |
dmiip2024_2 |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.3158 |
0.4737 |
0.3816 |
0.5141 |
0.5125 |
0.4948 |
dmiip2024_4 |
0.3846 |
- |
0.5556 |
0.2778 |
0.2632 |
0.5789 |
0.3947 |
0.5000 |
0.4995 |
0.4838 |
dmiip2024_3 |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.5263 |
0.5263 |
0.5263 |
0.5784 |
0.5247 |
0.5456 |
mibi_rag_snippet |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.1579 |
0.1579 |
0.1579 |
0.5444 |
0.4512 |
0.4843 |
mibi_rag_abstract |
0.8846 |
0.9091 |
0.8421 |
0.8756 |
0.1053 |
0.1053 |
0.1053 |
0.3685 |
0.3513 |
0.3500 |
Synthia with first |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.1053 |
0.1053 |
0.1053 |
0.4352 |
0.4180 |
0.4150 |
RMC_append_snippets |
0.9615 |
0.9697 |
0.9474 |
0.9585 |
0.5263 |
0.5263 |
0.5263 |
0.4222 |
0.3817 |
0.3900 |
bioinfo-0 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
bioinfo-1 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
bioinfo-2 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
bioinfo-3 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
bioinfo-4 |
0.6154 |
0.7619 |
- |
0.3810 |
- | - | - |
- | - | - |
UR-IW-1 |
0.9615 |
0.9697 |
0.9474 |
0.9585 |
0.6316 |
0.7368 |
0.6842 |
0.5061 |
0.5246 |
0.5047 |
UR-IW-2 |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.6842 |
0.6842 |
0.6842 |
0.5835 |
0.5645 |
0.5698 |
UR-IW-3 |
0.9615 |
0.9677 |
0.9524 |
0.9601 |
0.5263 |
0.5263 |
0.5263 |
0.5650 |
0.5347 |
0.5434 |
UR-IW-4 |
0.8462 |
0.8667 |
0.8182 |
0.8424 |
0.6316 |
0.6316 |
0.6316 |
0.5863 |
0.5645 |
0.5708 |
UR-IW-5 |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.4211 |
0.4211 |
0.4211 |
0.5009 |
0.5347 |
0.5033 |
Gatech competition |
0.9615 |
0.9677 |
0.9524 |
0.9601 |
0.3684 |
0.3684 |
0.3684 |
0.3843 |
0.2686 |
0.2936 |
GTBioASQsys2 |
0.8846 |
0.9032 |
0.8571 |
0.8802 |
0.3158 |
0.3158 |
0.3158 |
0.5062 |
0.5184 |
0.4964 |
IBE-LM ver1 |
0.6154 |
0.7619 |
- |
0.3810 |
0.2105 |
0.5263 |
0.3333 |
0.1889 |
0.2810 |
0.2085 |
IBE-LM ver2 |
0.6154 |
0.7619 |
- |
0.3810 |
0.3684 |
0.5789 |
0.4491 |
0.2111 |
0.3111 |
0.2366 |
IBE-LM ver3 |
0.6154 |
0.7619 |
- |
0.3810 |
0.3158 |
0.5263 |
0.3895 |
0.1556 |
0.2352 |
0.1755 |
IBE-LM ver4 |
0.6154 |
0.7619 |
- |
0.3810 |
0.3684 |
0.5789 |
0.4342 |
0.2222 |
0.3347 |
0.2499 |
IBE-LM ver 5 |
0.6154 |
0.7619 |
- |
0.3810 |
0.3158 |
0.5789 |
0.4254 |
0.2222 |
0.3073 |
0.2381 |
LLM4SciLit |
0.3846 |
- |
0.5556 |
0.2778 |
- | - | - |
- | - | - |
Fleming-3 |
0.9615 |
0.9697 |
0.9474 |
0.9585 |
0.3684 |
0.5263 |
0.4342 |
0.5475 |
0.5117 |
0.5243 |
IISR first submit |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.2632 |
0.2632 |
0.2632 |
0.5166 |
0.4968 |
0.5008 |
IISR 2nd submit |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.3684 |
0.3684 |
0.3684 |
0.6166 |
0.4915 |
0.5261 |
IISR 3rd submit |
0.7692 |
0.8235 |
0.6667 |
0.7451 |
0.2632 |
0.2632 |
0.2632 |
0.4981 |
0.4443 |
0.4610 |
IISR 4th submit |
0.9615 |
0.9697 |
0.9474 |
0.9585 |
0.4211 |
0.4211 |
0.4211 |
0.5436 |
0.4515 |
0.4840 |
IISR 5th submit |
0.8846 |
0.9143 |
0.8235 |
0.8689 |
0.4211 |
0.4211 |
0.4211 |
0.5595 |
0.4947 |
0.5023 |
CPS |
0.7308 |
0.8000 |
0.5882 |
0.6941 |
0.2105 |
0.2105 |
0.2105 |
0.4035 |
0.4055 |
0.3661 |
CPS2 |
0.6923 |
0.7895 |
0.4286 |
0.6090 |
- | - | - |
- | - | - |
Mistral-7B finetune |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.5263 |
0.5263 |
0.5263 |
0.5777 |
0.4851 |
0.5090 |
simple truncation |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.1053 |
0.2105 |
0.1579 |
- | - | - |
kmeans |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.1053 |
0.2105 |
0.1579 |
- | - | - |
similarity measures |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.1053 |
0.2105 |
0.1579 |
- | - | - |
extractive |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.1053 |
0.2105 |
0.1579 |
- | - | - |
abstractive |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.1053 |
0.3158 |
0.2018 |
0.3492 |
0.3204 |
0.3255 |
lasige-ku |
0.6154 |
0.7222 |
0.3750 |
0.5486 |
- | - | - |
- | - | - |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
dmiip2024 |
0.2401 |
0.2365 |
0.2392 |
0.2276 |
- |
- |
- |
- |
dmiip2024_1 |
0.2501 |
0.2443 |
0.2586 |
0.2484 |
- |
- |
- |
- |
dmiip2024_2 |
0.2527 |
0.2441 |
0.2659 |
0.2443 |
- |
- |
- |
- |
dmiip2024_4 |
0.2566 |
0.2476 |
0.2654 |
0.2478 |
- |
- |
- |
- |
dmiip2024_3 |
0.1908 |
0.1742 |
0.2125 |
0.1827 |
- |
- |
- |
- |
mibi_rag_snippet |
0.3512 |
0.2175 |
0.3762 |
0.2171 |
- |
- |
- |
- |
mibi_rag_abstract |
0.3118 |
0.1790 |
0.3316 |
0.1781 |
- |
- |
- |
- |
Synthia with first |
0.2569 |
0.2048 |
0.2734 |
0.2056 |
- |
- |
- |
- |
RMC_append_snippets |
0.3170 |
0.2443 |
0.3308 |
0.2419 |
- |
- |
- |
- |
bioinfo-0 |
0.3311 |
0.1112 |
0.3596 |
0.1104 |
- |
- |
- |
- |
bioinfo-1 |
0.3770 |
0.1388 |
0.3984 |
0.1323 |
- |
- |
- |
- |
bioinfo-2 |
0.1386 |
0.1236 |
0.1401 |
0.1189 |
- |
- |
- |
- |
bioinfo-3 |
0.1843 |
0.1632 |
0.1967 |
0.1634 |
- |
- |
- |
- |
bioinfo-4 |
0.3336 |
0.1191 |
0.3522 |
0.1134 |
- |
- |
- |
- |
UR-IW-1 |
0.3758 |
0.2684 |
0.4021 |
0.2646 |
- |
- |
- |
- |
UR-IW-2 |
0.3730 |
0.3595 |
0.3870 |
0.3666 |
- |
- |
- |
- |
UR-IW-3 |
0.4333 |
0.4008 |
0.4462 |
0.4035 |
- |
- |
- |
- |
UR-IW-4 |
0.3965 |
0.2577 |
0.4125 |
0.2528 |
- |
- |
- |
- |
UR-IW-5 |
0.3596 |
0.2114 |
0.3789 |
0.2054 |
- |
- |
- |
- |
Gatech competition |
0.2092 |
0.1774 |
0.2129 |
0.1696 |
- |
- |
- |
- |
GTBioASQsys2 |
0.2047 |
0.1793 |
0.2173 |
0.1769 |
- |
- |
- |
- |
IBE-LM ver1 |
- |
- |
- |
- |
- |
- |
- |
- |
IBE-LM ver2 |
- |
- |
- |
- |
- |
- |
- |
- |
IBE-LM ver3 |
- |
- |
- |
- |
- |
- |
- |
- |
IBE-LM ver4 |
- |
- |
- |
- |
- |
- |
- |
- |
IBE-LM ver 5 |
- |
- |
- |
- |
- |
- |
- |
- |
LLM4SciLit |
0.0594 |
0.0853 |
0.0558 |
0.0791 |
- |
- |
- |
- |
Fleming-3 |
0.3367 |
0.1383 |
0.3584 |
0.1327 |
- |
- |
- |
- |
IISR first submit |
0.3439 |
0.1801 |
0.3657 |
0.1725 |
- |
- |
- |
- |
IISR 2nd submit |
0.2365 |
0.1840 |
0.2457 |
0.1786 |
- |
- |
- |
- |
IISR 3rd submit |
0.2338 |
0.1421 |
0.2577 |
0.1448 |
- |
- |
- |
- |
IISR 4th submit |
0.2840 |
0.2124 |
0.2962 |
0.2065 |
- |
- |
- |
- |
IISR 5th submit |
0.3382 |
0.1818 |
0.3548 |
0.1731 |
- |
- |
- |
- |
CPS |
0.2337 |
0.1627 |
0.2491 |
0.1612 |
- |
- |
- |
- |
CPS2 |
0.2413 |
0.1709 |
0.2560 |
0.1694 |
- |
- |
- |
- |
Mistral-7B finetune |
0.3156 |
0.2964 |
0.3335 |
0.3033 |
- |
- |
- |
- |
simple truncation |
0.1242 |
0.0794 |
0.1526 |
0.0858 |
- |
- |
- |
- |
kmeans |
0.1362 |
0.1025 |
0.1404 |
0.0997 |
- |
- |
- |
- |
similarity measures |
0.3055 |
0.1299 |
0.3209 |
0.1244 |
- |
- |
- |
- |
extractive |
0.2792 |
0.1197 |
0.3076 |
0.1210 |
- |
- |
- |
- |
abstractive |
0.1242 |
0.0794 |
0.1526 |
0.0858 |
- |
- |
- |
- |
lasige-ku |
0.0416 |
0.0227 |
0.0783 |
0.0404 |
- |
- |
- |
- |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
Test batch 4
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |