BioASQ Participants Area
Task 13b: Test Results of Phase A+
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task A+ are presented
here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
UniTor_0 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3462 |
0.3462 |
0.3462 |
0.1536 |
0.2089 |
0.1720 |
UniTor_1 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.3077 |
0.3077 |
0.1867 |
0.2824 |
0.2140 |
Only uses GPT-4o |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.1538 |
0.2308 |
0.1859 |
0.0651 |
0.0883 |
0.0699 |
NN_Persona_2 |
0.8824 |
0.9231 |
0.7500 |
0.8365 |
0.1923 |
0.2692 |
0.2308 |
0.1082 |
0.1766 |
0.1273 |
Baseline top 20 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3846 |
0.4231 |
0.4038 |
0.2105 |
0.2425 |
0.2175 |
Using LLM alone |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1923 |
0.2308 |
0.2115 |
0.1722 |
0.2423 |
0.1934 |
Baseline top 10 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3462 |
0.3846 |
0.3654 |
0.2015 |
0.2575 |
0.2141 |
Using KG for list q |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.2308 |
0.4231 |
0.3090 |
0.0631 |
0.2223 |
0.0872 |
Main pipeline |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.2308 |
0.4231 |
0.3090 |
0.0421 |
0.1804 |
0.0634 |
bioinfo-0 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
bioinfo-1 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
bioinfo-2 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
bioinfo-3 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
bioinfo-4 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
UniTor_2 |
0.8824 |
0.9167 |
0.8000 |
0.8583 |
0.3077 |
0.3462 |
0.3269 |
0.1156 |
0.1693 |
0.1307 |
UniTor_3 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.3462 |
0.3269 |
0.1098 |
0.1732 |
0.1290 |
NN_Persona_1 |
0.8824 |
0.9231 |
0.7500 |
0.8365 |
0.2692 |
0.3077 |
0.2885 |
0.1002 |
0.1744 |
0.1183 |
NN_Persona_3 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.2692 |
0.3077 |
0.2885 |
0.0936 |
0.1647 |
0.1092 |
deepseek32b-me |
0.2941 |
- |
0.4545 |
0.2273 |
0.0769 |
0.0769 |
0.0769 |
0.2078 |
0.1751 |
0.1843 |
GPT4O |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.0385 |
0.0385 |
0.0385 |
0.1459 |
0.1551 |
0.1432 |
Fleming-1 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.2692 |
0.4231 |
0.3186 |
0.1268 |
0.1341 |
0.1296 |
Fleming-2 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.2692 |
0.4231 |
0.3186 |
0.1276 |
0.1758 |
0.1440 |
google_serach_&_LLM |
0.7059 |
0.7368 |
0.6667 |
0.7018 |
0.1154 |
0.1154 |
0.1154 |
0.1479 |
0.1027 |
0.1179 |
DB_vector_&_LLM |
0.4706 |
0.4000 |
0.5263 |
0.4632 |
0.1538 |
0.1538 |
0.1538 |
0.0815 |
0.0749 |
0.0777 |
IRIS_1 |
0.2941 |
- |
0.4545 |
0.2273 |
- | - | - |
- | - | - |
IRIS_2 |
0.2941 |
- |
0.4545 |
0.2273 |
- | - | - |
- | - | - |
UR-IW-1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2692 |
0.3462 |
0.3077 |
0.2070 |
0.3232 |
0.2411 |
UR-IW-3 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.2692 |
0.3462 |
0.3077 |
0.1685 |
0.2804 |
0.2004 |
UR-IW-5 |
0.8235 |
0.8696 |
0.7273 |
0.7984 |
0.3462 |
0.4231 |
0.3750 |
0.2164 |
0.3003 |
0.2395 |
bious2 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1923 |
0.2308 |
0.2115 |
0.1310 |
0.1606 |
0.1411 |
IRIS_3 |
0.2941 |
- |
0.4545 |
0.2273 |
- | - | - |
- | - | - |
bious3 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2308 |
0.2308 |
0.2308 |
0.1377 |
0.1432 |
0.1398 |
bious4 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1538 |
0.1923 |
0.1731 |
0.1479 |
0.1309 |
0.1380 |
bious5 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1154 |
0.1154 |
0.1154 |
0.1839 |
0.1911 |
0.1787 |
IR1 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.3846 |
0.3846 |
0.3846 |
0.1754 |
0.1186 |
0.1306 |
Fleming-3 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.2692 |
0.4231 |
0.3186 |
0.1276 |
0.1758 |
0.1440 |
qa |
0.2941 |
- |
0.4545 |
0.2273 |
0.0385 |
0.0385 |
0.0385 |
0.0884 |
0.0797 |
0.0821 |
bious1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1923 |
0.1923 |
0.1923 |
0.1482 |
0.1838 |
0.1610 |
deepseek-r1:32b |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.0769 |
0.0769 |
0.0769 |
0.1153 |
0.1128 |
0.1067 |
config-1 |
0.8824 |
0.9231 |
0.7500 |
0.8365 |
0.3077 |
0.3077 |
0.3077 |
0.1787 |
0.1527 |
0.1623 |
config-2 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.3462 |
0.3269 |
0.2200 |
0.1911 |
0.2018 |
config-3 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.3077 |
0.3462 |
0.3205 |
0.2172 |
0.2306 |
0.2215 |
config-4 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.3462 |
0.3269 |
0.2145 |
0.2139 |
0.2112 |
config-5 |
0.8824 |
0.9167 |
0.8000 |
0.8583 |
0.3077 |
0.3462 |
0.3269 |
0.2338 |
0.2413 |
0.2338 |
mistral |
0.8824 |
0.9167 |
0.8000 |
0.8583 |
0.3077 |
0.3462 |
0.3269 |
0.1544 |
0.1471 |
0.1460 |
UR-IW-2 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.3462 |
0.4231 |
0.3782 |
0.2290 |
0.3056 |
0.2567 |
UR-IW-4 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2692 |
0.3077 |
0.2885 |
0.2134 |
0.2783 |
0.2357 |
dmiip2024 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3846 |
0.3846 |
0.3846 |
0.2022 |
0.2174 |
0.2058 |
dmiip2024_1 |
0.8235 |
0.8800 |
0.6667 |
0.7733 |
0.3846 |
0.3846 |
0.3846 |
0.2362 |
0.2370 |
0.2330 |
deepseek-r1:14b |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.1154 |
0.1154 |
0.1154 |
0.1254 |
0.1150 |
0.1139 |
dmiip2024_2 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.4231 |
0.4615 |
0.4423 |
0.1927 |
0.2070 |
0.1911 |
deepseek-r1:8b |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.0769 |
0.0769 |
0.0769 |
0.1722 |
0.1833 |
0.1708 |
gpt 01 mini |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.0769 |
0.0769 |
0.0769 |
0.1153 |
0.1128 |
0.1067 |
dmiip2024_3 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.4231 |
0.5000 |
0.4551 |
0.1708 |
0.2876 |
0.2050 |
deepseek32b-full |
0.2941 |
- |
0.4545 |
0.2273 |
0.0769 |
0.0769 |
0.0769 |
0.1977 |
0.1773 |
0.1801 |
lasigeBioTM |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1154 |
0.1154 |
0.1154 |
0.1138 |
0.1362 |
0.1203 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
UniTor_0 |
0.2359 |
0.2088 |
0.2467 |
0.2133 |
- |
- |
- |
- |
UniTor_1 |
0.2254 |
0.1995 |
0.2339 |
0.2026 |
- |
- |
- |
- |
Only uses GPT-4o |
0.1828 |
0.1061 |
0.2069 |
0.1114 |
- |
- |
- |
- |
NN_Persona_2 |
0.2239 |
0.0886 |
0.2505 |
0.0939 |
- |
- |
- |
- |
Baseline top 20 |
0.0273 |
0.0264 |
0.0295 |
0.0288 |
- |
- |
- |
- |
Using LLM alone |
0.0158 |
0.0161 |
0.0183 |
0.0188 |
- |
- |
- |
- |
Baseline top 10 |
0.0286 |
0.0260 |
0.0279 |
0.0265 |
- |
- |
- |
- |
Using KG for list q |
0.0343 |
0.0202 |
0.0369 |
0.0226 |
- |
- |
- |
- |
Main pipeline |
0.0343 |
0.0202 |
0.0369 |
0.0226 |
- |
- |
- |
- |
bioinfo-0 |
0.1597 |
0.1298 |
0.1697 |
0.1298 |
- |
- |
- |
- |
bioinfo-1 |
0.3062 |
0.1243 |
0.3272 |
0.1251 |
- |
- |
- |
- |
bioinfo-2 |
0.2443 |
0.1114 |
0.2640 |
0.1122 |
- |
- |
- |
- |
bioinfo-3 |
0.2334 |
0.1053 |
0.2579 |
0.1071 |
- |
- |
- |
- |
bioinfo-4 |
0.2120 |
0.0923 |
0.2333 |
0.0990 |
- |
- |
- |
- |
UniTor_2 |
0.2087 |
0.1801 |
0.2237 |
0.1838 |
- |
- |
- |
- |
UniTor_3 |
0.2118 |
0.1963 |
0.2247 |
0.2006 |
- |
- |
- |
- |
NN_Persona_1 |
0.2378 |
0.1008 |
0.2762 |
0.1059 |
- |
- |
- |
- |
NN_Persona_3 |
0.2277 |
0.0892 |
0.2530 |
0.0930 |
- |
- |
- |
- |
deepseek32b-me |
0.1719 |
0.0875 |
0.2013 |
0.0928 |
- |
- |
- |
- |
GPT4O |
0.2072 |
0.0798 |
0.2290 |
0.0847 |
- |
- |
- |
- |
Fleming-1 |
0.2292 |
0.1004 |
0.2490 |
0.1027 |
- |
- |
- |
- |
Fleming-2 |
0.2292 |
0.1004 |
0.2490 |
0.1027 |
- |
- |
- |
- |
google_serach_&_LLM |
0.1509 |
0.0557 |
0.1839 |
0.0658 |
- |
- |
- |
- |
DB_vector_&_LLM |
0.1279 |
0.0442 |
0.1536 |
0.0533 |
- |
- |
- |
- |
IRIS_1 |
0.1459 |
0.1075 |
0.1633 |
0.1121 |
- |
- |
- |
- |
IRIS_2 |
0.1546 |
0.1052 |
0.1705 |
0.1078 |
- |
- |
- |
- |
UR-IW-1 |
0.2598 |
0.1286 |
0.2882 |
0.1329 |
- |
- |
- |
- |
UR-IW-3 |
0.2429 |
0.1259 |
0.2692 |
0.1278 |
- |
- |
- |
- |
UR-IW-5 |
0.2586 |
0.1549 |
0.2875 |
0.1599 |
- |
- |
- |
- |
bious2 |
0.1699 |
0.1212 |
0.1775 |
0.1215 |
- |
- |
- |
- |
IRIS_3 |
0.1604 |
0.0963 |
0.1706 |
0.1000 |
- |
- |
- |
- |
bious3 |
0.1707 |
0.1252 |
0.1827 |
0.1293 |
- |
- |
- |
- |
bious4 |
0.1720 |
0.1221 |
0.1784 |
0.1216 |
- |
- |
- |
- |
bious5 |
0.1816 |
0.1315 |
0.1921 |
0.1339 |
- |
- |
- |
- |
IR1 |
0.1779 |
0.1497 |
0.1832 |
0.1461 |
- |
- |
- |
- |
Fleming-3 |
0.2735 |
0.0740 |
0.3014 |
0.0787 |
- |
- |
- |
- |
qa |
0.1330 |
0.0792 |
0.1566 |
0.0864 |
- |
- |
- |
- |
bious1 |
0.1911 |
0.1341 |
0.2007 |
0.1339 |
- |
- |
- |
- |
deepseek-r1:32b |
0.1829 |
0.0818 |
0.2098 |
0.0898 |
- |
- |
- |
- |
config-1 |
0.2551 |
0.1323 |
0.2655 |
0.1323 |
- |
- |
- |
- |
config-2 |
0.2384 |
0.1188 |
0.2649 |
0.1249 |
- |
- |
- |
- |
config-3 |
0.2489 |
0.1273 |
0.2771 |
0.1337 |
- |
- |
- |
- |
config-4 |
0.2458 |
0.1203 |
0.2673 |
0.1226 |
- |
- |
- |
- |
config-5 |
0.2450 |
0.1240 |
0.2655 |
0.1261 |
- |
- |
- |
- |
mistral |
0.2286 |
0.1513 |
0.2413 |
0.1524 |
- |
- |
- |
- |
UR-IW-2 |
0.1997 |
0.1073 |
0.2408 |
0.1214 |
- |
- |
- |
- |
UR-IW-4 |
0.2117 |
0.0930 |
0.2329 |
0.0957 |
- |
- |
- |
- |
dmiip2024 |
0.2295 |
0.1229 |
0.2584 |
0.1292 |
- |
- |
- |
- |
dmiip2024_1 |
0.2550 |
0.1464 |
0.2684 |
0.1465 |
- |
- |
- |
- |
deepseek-r1:14b |
0.1881 |
0.0860 |
0.2126 |
0.0932 |
- |
- |
- |
- |
dmiip2024_2 |
0.1770 |
0.1639 |
0.1823 |
0.1667 |
- |
- |
- |
- |
deepseek-r1:8b |
0.2062 |
0.0782 |
0.2317 |
0.0841 |
- |
- |
- |
- |
gpt 01 mini |
0.1829 |
0.0818 |
0.2098 |
0.0898 |
- |
- |
- |
- |
dmiip2024_3 |
0.2052 |
0.1712 |
0.2141 |
0.1754 |
- |
- |
- |
- |
deepseek32b-full |
0.1709 |
0.0909 |
0.1888 |
0.0897 |
- |
- |
- |
- |
lasigeBioTM |
0.1712 |
0.1030 |
0.1850 |
0.1082 |
- |
- |
- |
- |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
Only uses GPT-4o |
0.5882 |
0.5882 |
0.5882 |
0.5882 |
0.1852 |
0.1852 |
0.1852 |
0.1997 |
0.2261 |
0.2093 |
NN_Persona_1 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.2222 |
0.2222 |
0.2222 |
0.1631 |
0.3269 |
0.1939 |
NN_Persona_2 |
0.7647 |
0.8000 |
0.7143 |
0.7571 |
0.3333 |
0.3704 |
0.3519 |
0.2033 |
0.3182 |
0.2303 |
NN_Persona_3 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.2963 |
0.4074 |
0.3519 |
0.1892 |
0.2919 |
0.2173 |
Fleming-1 |
0.9412 |
0.9524 |
0.9231 |
0.9377 |
0.2222 |
0.3704 |
0.2790 |
0.1934 |
0.3238 |
0.2242 |
UniTor_0 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.4444 |
0.4444 |
0.4444 |
0.2765 |
0.5081 |
0.3296 |
UniTor_1 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.4444 |
0.4444 |
0.4444 |
0.2765 |
0.5081 |
0.3296 |
UniTor_2 |
0.7059 |
0.7619 |
0.6154 |
0.6886 |
0.3704 |
0.3704 |
0.3704 |
0.2648 |
0.4433 |
0.3017 |
UniTor_3 |
0.7059 |
0.7619 |
0.6154 |
0.6886 |
0.3704 |
0.3704 |
0.3704 |
0.2648 |
0.4433 |
0.3017 |
Baseline top 20 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.4444 |
0.4815 |
0.4630 |
0.3785 |
0.4357 |
0.3880 |
Main pipeline |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3333 |
0.4074 |
0.3642 |
0.1814 |
0.4753 |
0.2238 |
Using LLM alone |
0.5294 |
0.5556 |
0.5000 |
0.5278 |
0.2593 |
0.2593 |
0.2593 |
0.2013 |
0.3050 |
0.2288 |
Baseline top 10 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.4074 |
0.4815 |
0.4383 |
0.3609 |
0.4378 |
0.3748 |
Using KG for list q |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3333 |
0.4074 |
0.3642 |
0.1937 |
0.5016 |
0.2412 |
UR-IW-1 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.4074 |
0.4815 |
0.4383 |
0.2307 |
0.3536 |
0.2682 |
UR-IW-2 |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.4815 |
0.5185 |
0.5000 |
0.3449 |
0.4626 |
0.3805 |
UR-IW-3 |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.3704 |
0.4444 |
0.3920 |
0.1696 |
0.3213 |
0.2118 |
UR-IW-4 |
0.7059 |
0.7368 |
0.6667 |
0.7018 |
0.5185 |
0.5556 |
0.5370 |
0.2859 |
0.3652 |
0.3023 |
NN_Baseline |
0.7647 |
0.8000 |
0.7143 |
0.7571 |
0.2963 |
0.2963 |
0.2963 |
0.1964 |
0.3080 |
0.2259 |
bioinfo-1 |
0.6471 |
0.7857 |
- |
0.3929 |
- | - | - |
- | - | - |
bioinfo-2 |
0.6471 |
0.7857 |
- |
0.3929 |
- | - | - |
- | - | - |
bioinfo-3 |
0.6471 |
0.7857 |
- |
0.3929 |
- | - | - |
- | - | - |
bioinfo-4 |
0.6471 |
0.7857 |
- |
0.3929 |
- | - | - |
- | - | - |
bious3 |
0.7647 |
0.8182 |
0.6667 |
0.7424 |
0.3333 |
0.4444 |
0.3889 |
0.1386 |
0.1463 |
0.1355 |
Fleming-2 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.2222 |
0.3704 |
0.2790 |
0.1934 |
0.3238 |
0.2242 |
UR-IW-5 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2963 |
0.3333 |
0.3086 |
0.1796 |
0.3432 |
0.2144 |
lasigeBioTM |
0.4706 |
0.5263 |
0.4000 |
0.4632 |
0.2593 |
0.2593 |
0.2593 |
0.3684 |
0.1058 |
0.1598 |
dmiip2024 |
0.7647 |
0.8000 |
0.7143 |
0.7571 |
0.4815 |
0.5556 |
0.5185 |
0.2478 |
0.3686 |
0.2792 |
dmiip2024_2 |
0.9412 |
0.9524 |
0.9231 |
0.9377 |
0.4444 |
0.4444 |
0.4444 |
0.1987 |
0.4024 |
0.2559 |
dmiip2024_3 |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.4074 |
0.4444 |
0.4259 |
0.3614 |
0.4018 |
0.3711 |
dmiip2024_4 |
0.8235 |
0.8696 |
0.7273 |
0.7984 |
0.5926 |
0.5926 |
0.5926 |
0.2900 |
0.4419 |
0.3333 |
dmiip2024_1 |
0.7647 |
0.8000 |
0.7143 |
0.7571 |
0.4815 |
0.4815 |
0.4815 |
0.2598 |
0.3598 |
0.2837 |
IR2 |
0.7059 |
0.7368 |
0.6667 |
0.7018 |
0.4444 |
0.4444 |
0.4444 |
0.1938 |
0.2155 |
0.1974 |
deepseek32b-me |
0.8235 |
0.8800 |
0.6667 |
0.7733 |
0.2963 |
0.2963 |
0.2963 |
0.2199 |
0.3116 |
0.2390 |
mistral |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.4074 |
0.5185 |
0.4475 |
0.1986 |
0.3514 |
0.2330 |
gpt 01 mini |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.0741 |
0.1111 |
0.0926 |
0.1781 |
0.2299 |
0.1867 |
bious4 |
0.8824 |
0.9167 |
0.8000 |
0.8583 |
0.4444 |
0.4444 |
0.4444 |
0.2128 |
0.2759 |
0.2345 |
phaseB-5 |
0.9412 |
0.9524 |
0.9231 |
0.9377 |
0.3333 |
0.3333 |
0.3333 |
0.2395 |
0.2984 |
0.2574 |
phaseB-4 |
0.9412 |
0.9524 |
0.9231 |
0.9377 |
0.3333 |
0.3333 |
0.3333 |
0.2395 |
0.2984 |
0.2574 |
deepseek32b-f |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.2963 |
0.2963 |
0.2963 |
0.2458 |
0.3160 |
0.2675 |
bious2 |
0.7059 |
0.7368 |
0.6667 |
0.7018 |
0.2963 |
0.3333 |
0.3148 |
0.1635 |
0.2189 |
0.1829 |
bious1 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.3704 |
0.4074 |
0.3889 |
0.1868 |
0.2105 |
0.1956 |
bious5 |
0.7647 |
0.8182 |
0.6667 |
0.7424 |
0.3333 |
0.3704 |
0.3519 |
0.1792 |
0.2560 |
0.1933 |
deepseek-r1:14b |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.1481 |
0.1481 |
0.1481 |
0.1649 |
0.1750 |
0.1666 |
deepseek-r1:8b |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.1111 |
0.1111 |
0.1111 |
0.3358 |
0.3805 |
0.3422 |
deepseek-r1:32b |
0.7647 |
0.8000 |
0.7143 |
0.7571 |
0.1111 |
0.1111 |
0.1111 |
0.3270 |
0.4234 |
0.3563 |
bioinfo-0 |
0.6471 |
0.7857 |
- |
0.3929 |
- | - | - |
- | - | - |
GPT4O |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.1111 |
0.1111 |
0.1111 |
0.3358 |
0.3805 |
0.3422 |
deepseek32b-full |
0.8824 |
0.9167 |
0.8000 |
0.8583 |
0.2963 |
0.2963 |
0.2963 |
0.3038 |
0.3072 |
0.2955 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
Only uses GPT-4o |
0.1925 |
0.1543 |
0.1995 |
0.1511 |
- |
- |
- |
- |
NN_Persona_1 |
0.2427 |
0.1081 |
0.2662 |
0.1092 |
- |
- |
- |
- |
NN_Persona_2 |
0.2262 |
0.1230 |
0.2458 |
0.1270 |
- |
- |
- |
- |
NN_Persona_3 |
0.2273 |
0.1163 |
0.2475 |
0.1186 |
- |
- |
- |
- |
Fleming-1 |
0.2978 |
0.1013 |
0.3207 |
0.1064 |
- |
- |
- |
- |
UniTor_0 |
0.2419 |
0.2177 |
0.2416 |
0.2100 |
- |
- |
- |
- |
UniTor_1 |
0.2408 |
0.2170 |
0.2395 |
0.2091 |
- |
- |
- |
- |
UniTor_2 |
0.2408 |
0.2204 |
0.2440 |
0.2117 |
- |
- |
- |
- |
UniTor_3 |
0.2345 |
0.2153 |
0.2377 |
0.2067 |
- |
- |
- |
- |
Baseline top 20 |
0.0400 |
0.0354 |
0.0422 |
0.0373 |
- |
- |
- |
- |
Main pipeline |
0.0561 |
0.0372 |
0.0528 |
0.0342 |
- |
- |
- |
- |
Using LLM alone |
0.0327 |
0.0287 |
0.0333 |
0.0293 |
- |
- |
- |
- |
Baseline top 10 |
0.0377 |
0.0319 |
0.0387 |
0.0323 |
- |
- |
- |
- |
Using KG for list q |
0.0561 |
0.0372 |
0.0528 |
0.0342 |
- |
- |
- |
- |
UR-IW-1 |
0.2476 |
0.1405 |
0.2682 |
0.1425 |
- |
- |
- |
- |
UR-IW-2 |
0.2000 |
0.1425 |
0.2229 |
0.1448 |
- |
- |
- |
- |
UR-IW-3 |
0.2303 |
0.1439 |
0.2509 |
0.1443 |
- |
- |
- |
- |
UR-IW-4 |
0.2063 |
0.1267 |
0.2234 |
0.1253 |
- |
- |
- |
- |
NN_Baseline |
0.2113 |
0.0949 |
0.2413 |
0.0998 |
- |
- |
- |
- |
bioinfo-1 |
0.1840 |
0.1163 |
0.2005 |
0.1169 |
- |
- |
- |
- |
bioinfo-2 |
0.2138 |
0.1186 |
0.2284 |
0.1173 |
- |
- |
- |
- |
bioinfo-3 |
0.2161 |
0.1133 |
0.2306 |
0.1129 |
- |
- |
- |
- |
bioinfo-4 |
0.2185 |
0.1063 |
0.2328 |
0.1080 |
- |
- |
- |
- |
bious3 |
0.1845 |
0.1635 |
0.1980 |
0.1640 |
- |
- |
- |
- |
Fleming-2 |
0.2370 |
0.1202 |
0.2605 |
0.1263 |
- |
- |
- |
- |
UR-IW-5 |
0.2140 |
0.1431 |
0.2242 |
0.1412 |
- |
- |
- |
- |
lasigeBioTM |
0.1573 |
0.1311 |
0.1574 |
0.1261 |
- |
- |
- |
- |
dmiip2024 |
0.2253 |
0.2218 |
0.2241 |
0.2177 |
- |
- |
- |
- |
dmiip2024_2 |
0.1790 |
0.1848 |
0.1796 |
0.1802 |
- |
- |
- |
- |
dmiip2024_3 |
0.1644 |
0.1648 |
0.1623 |
0.1551 |
- |
- |
- |
- |
dmiip2024_4 |
0.1869 |
0.1961 |
0.1853 |
0.1871 |
- |
- |
- |
- |
dmiip2024_1 |
0.2157 |
0.2162 |
0.2268 |
0.2150 |
- |
- |
- |
- |
IR2 |
0.2356 |
0.2075 |
0.2390 |
0.2040 |
- |
- |
- |
- |
deepseek32b-me |
0.1837 |
0.1071 |
0.1958 |
0.1083 |
- |
- |
- |
- |
mistral |
0.1998 |
0.1244 |
0.2198 |
0.1275 |
- |
- |
- |
- |
gpt 01 mini |
0.1324 |
0.1159 |
0.1462 |
0.1292 |
- |
- |
- |
- |
bious4 |
0.2017 |
0.1711 |
0.2103 |
0.1694 |
- |
- |
- |
- |
phaseB-5 |
0.2071 |
0.1191 |
0.2161 |
0.1194 |
- |
- |
- |
- |
phaseB-4 |
0.2071 |
0.1191 |
0.2161 |
0.1194 |
- |
- |
- |
- |
deepseek32b-f |
0.2005 |
0.1088 |
0.2072 |
0.1073 |
- |
- |
- |
- |
bious2 |
0.1846 |
0.1618 |
0.1916 |
0.1578 |
- |
- |
- |
- |
bious1 |
0.2008 |
0.1701 |
0.2102 |
0.1672 |
- |
- |
- |
- |
bious5 |
0.1929 |
0.1682 |
0.1989 |
0.1634 |
- |
- |
- |
- |
deepseek-r1:14b |
0.1717 |
0.0892 |
0.1850 |
0.0926 |
- |
- |
- |
- |
deepseek-r1:8b |
0.2043 |
0.1087 |
0.2218 |
0.1114 |
- |
- |
- |
- |
deepseek-r1:32b |
0.2272 |
0.1206 |
0.2389 |
0.1198 |
- |
- |
- |
- |
bioinfo-0 |
0.2515 |
0.2037 |
0.2564 |
0.1981 |
- |
- |
- |
- |
GPT4O |
0.2043 |
0.1087 |
0.2218 |
0.1114 |
- |
- |
- |
- |
deepseek32b-full |
0.1634 |
0.1004 |
0.1668 |
0.1007 |
- |
- |
- |
- |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
Only uses GPT-4o |
0.7273 |
0.8125 |
0.5000 |
0.6563 |
0.3000 |
0.3500 |
0.3250 |
0.2371 |
0.2275 |
0.2216 |
IR2 |
0.9545 |
0.9697 |
0.9091 |
0.9394 |
0.1000 |
0.1000 |
0.1000 |
0.3189 |
0.3258 |
0.3220 |
IR3 |
0.8182 |
0.8889 |
0.5000 |
0.6944 |
0.3000 |
0.4000 |
0.3500 |
0.4231 |
0.4764 |
0.4313 |
IR4 |
0.7273 |
0.8333 |
0.2500 |
0.5417 |
0.2500 |
0.3000 |
0.2750 |
0.3527 |
0.3851 |
0.3632 |
NN_Persona_3 |
0.7727 |
0.8571 |
0.4444 |
0.6508 |
0.3500 |
0.4000 |
0.3750 |
0.2656 |
0.3878 |
0.2961 |
NN_Baseline |
0.6364 |
0.7143 |
0.5000 |
0.6071 |
0.1000 |
0.2000 |
0.1500 |
0.3976 |
0.4472 |
0.4158 |
NN_Persona_1 |
0.8636 |
0.9091 |
0.7273 |
0.8182 |
0.1500 |
0.2000 |
0.1750 |
0.2441 |
0.3465 |
0.2746 |
NN_Persona_2 |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.3500 |
0.4000 |
0.3750 |
0.2470 |
0.3624 |
0.2792 |
IR1 |
0.6818 |
0.7742 |
0.4615 |
0.6179 |
0.1500 |
0.1500 |
0.1500 |
0.3538 |
0.3447 |
0.3482 |
UR-IW-2 |
0.8182 |
0.8824 |
0.6000 |
0.7412 |
0.2500 |
0.2500 |
0.2500 |
0.3656 |
0.3969 |
0.3599 |
UR-IW-4 |
0.8636 |
0.9091 |
0.7273 |
0.8182 |
0.2000 |
0.2000 |
0.2000 |
0.2206 |
0.2896 |
0.2371 |
lasigeBioTM |
0.4091 |
0.3810 |
0.4348 |
0.4079 |
0.0500 |
0.0500 |
0.0500 |
- | - | - |
lasigeBioTM-onto-sm |
0.5909 |
0.6667 |
0.4706 |
0.5686 |
- | - | - |
0.0455 |
0.0227 |
0.0303 |
Fleming-1 |
0.7273 |
0.8125 |
0.5000 |
0.6563 |
0.2000 |
0.3500 |
0.2625 |
0.3182 |
0.4412 |
0.3565 |
Baseline top 10 |
0.8182 |
0.8824 |
0.6000 |
0.7412 |
0.1500 |
0.2500 |
0.2000 |
0.3872 |
0.4143 |
0.3964 |
Using LLM alone |
0.6818 |
0.7879 |
0.3636 |
0.5758 |
0.3000 |
0.3500 |
0.3250 |
0.2023 |
0.2790 |
0.2263 |
Using KG for list q |
0.8182 |
0.8824 |
0.6000 |
0.7412 |
0.2500 |
0.3500 |
0.3000 |
0.2924 |
0.4704 |
0.3256 |
Main pipeline |
0.8182 |
0.8824 |
0.6000 |
0.7412 |
0.2500 |
0.3500 |
0.3000 |
0.2735 |
0.4359 |
0.2843 |
Baseline top 20 |
0.8636 |
0.9091 |
0.7273 |
0.8182 |
0.2000 |
0.2500 |
0.2250 |
0.3882 |
0.4355 |
0.4019 |
UniTor_0 |
0.7727 |
0.8485 |
0.5455 |
0.6970 |
0.2500 |
0.2500 |
0.2500 |
0.2588 |
0.3131 |
0.2752 |
UniTor_1 |
0.7727 |
0.8485 |
0.5455 |
0.6970 |
0.1500 |
0.1500 |
0.1500 |
0.2420 |
0.3056 |
0.2621 |
UniTor_2 |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.2500 |
0.3000 |
0.2667 |
0.2008 |
0.2500 |
0.2176 |
UniTor_3 |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.2000 |
0.2500 |
0.2167 |
0.1919 |
0.2595 |
0.2129 |
IR5 |
0.6364 |
0.7143 |
0.5000 |
0.6071 |
0.1500 |
0.1500 |
0.1500 |
0.2701 |
0.2689 |
0.2686 |
bious1 |
0.9091 |
0.9375 |
0.8333 |
0.8854 |
0.2000 |
0.2500 |
0.2250 |
0.3340 |
0.3389 |
0.3354 |
bious2 |
0.8182 |
0.8667 |
0.7143 |
0.7905 |
0.0500 |
0.0500 |
0.0500 |
0.4009 |
0.3943 |
0.3958 |
bious3 |
0.9091 |
0.9375 |
0.8333 |
0.8854 |
0.1500 |
0.1500 |
0.1500 |
0.3490 |
0.3699 |
0.3560 |
bious4 |
0.8182 |
0.8667 |
0.7143 |
0.7905 |
0.1000 |
0.1500 |
0.1250 |
0.3789 |
0.3965 |
0.3852 |
bious5 |
0.9091 |
0.9375 |
0.8333 |
0.8854 |
0.1000 |
0.1500 |
0.1250 |
0.3336 |
0.3336 |
0.3284 |
UR-IW-1 |
0.8636 |
0.9143 |
0.6667 |
0.7905 |
0.2000 |
0.4000 |
0.2875 |
0.3271 |
0.4040 |
0.3482 |
UR-IW-3 |
0.6818 |
0.7742 |
0.4615 |
0.6179 |
0.2000 |
0.5000 |
0.3100 |
0.3114 |
0.3777 |
0.3279 |
UR-IW-5 |
0.9091 |
0.9412 |
0.8000 |
0.8706 |
0.1000 |
0.3000 |
0.2000 |
0.3455 |
0.4111 |
0.3618 |
dmiip2024 |
0.8182 |
0.8824 |
0.6000 |
0.7412 |
0.2500 |
0.3500 |
0.2917 |
0.4304 |
0.4598 |
0.4384 |
dmiip2024_1 |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.2500 |
0.2500 |
0.2500 |
0.4387 |
0.4598 |
0.4425 |
dmiip2024_2 |
0.8636 |
0.9091 |
0.7273 |
0.8182 |
0.1500 |
0.2000 |
0.1750 |
0.3004 |
0.4598 |
0.3488 |
dmiip2024_3 |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.2000 |
0.3500 |
0.2750 |
0.4674 |
0.4446 |
0.4541 |
dmiip2024_4 |
0.9091 |
0.9444 |
0.7500 |
0.8472 |
0.2000 |
0.2000 |
0.2000 |
0.3933 |
0.4272 |
0.4060 |
lasigeBioTM-onto-bl |
0.5909 |
0.6667 |
0.4706 |
0.5686 |
0.1000 |
0.1000 |
0.1000 |
0.0909 |
0.0341 |
0.0485 |
IRIS_1 |
0.2273 |
- |
0.3704 |
0.1852 |
- | - | - |
- | - | - |
IRIS_2 |
0.2273 |
- |
0.3704 |
0.1852 |
- | - | - |
- | - | - |
IRIS_3 |
0.2273 |
- |
0.3704 |
0.1852 |
- | - | - |
- | - | - |
extractive |
0.7273 |
0.8000 |
0.5714 |
0.6857 |
0.0500 |
0.2500 |
0.1417 |
0.2499 |
0.2788 |
0.2570 |
mistral |
0.8636 |
0.9091 |
0.7273 |
0.8182 |
0.1000 |
0.3000 |
0.1917 |
0.3186 |
0.4141 |
0.3419 |
llama |
0.8636 |
0.9091 |
0.7273 |
0.8182 |
0.1500 |
0.4000 |
0.2750 |
0.3252 |
0.3806 |
0.3369 |
abstractive |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.2500 |
0.3500 |
0.3000 |
0.0303 |
0.0455 |
0.0364 |
bioinfo-0 |
0.7727 |
0.8718 |
- |
0.4359 |
- | - | - |
- | - | - |
bioinfo-1 |
0.7727 |
0.8718 |
- |
0.4359 |
- | - | - |
- | - | - |
bioinfo-2 |
0.7727 |
0.8718 |
- |
0.4359 |
- | - | - |
- | - | - |
bioinfo-3 |
0.7727 |
0.8718 |
- |
0.4359 |
- | - | - |
- | - | - |
bioinfo-4 |
0.7727 |
0.8718 |
- |
0.4359 |
- | - | - |
- | - | - |
dense |
0.8636 |
0.9091 |
0.7273 |
0.8182 |
0.2000 |
0.3000 |
0.2500 |
0.3443 |
0.4255 |
0.3643 |
sp_lasigebiotm |
0.2273 |
- |
0.3704 |
0.1852 |
- | - | - |
0.0455 |
0.0455 |
0.0455 |
GPT4O |
0.7727 |
0.8571 |
0.4444 |
0.6508 |
0.1500 |
0.1500 |
0.1500 |
0.1504 |
0.1742 |
0.1580 |
Fleming-2 |
0.8636 |
0.9091 |
0.7273 |
0.8182 |
0.2500 |
0.4000 |
0.3125 |
0.3182 |
0.4412 |
0.3565 |
deepseek32b-me |
0.7273 |
0.8000 |
0.5714 |
0.6857 |
0.2500 |
0.2500 |
0.2500 |
0.0876 |
0.1250 |
0.1000 |
deepseek-r1:14b |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
- | - | - |
0.1167 |
0.1591 |
0.1323 |
deepseek32b-full |
0.7273 |
0.8000 |
0.5714 |
0.6857 |
0.2500 |
0.2500 |
0.2500 |
0.1239 |
0.1477 |
0.1325 |
AQAMS |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.1500 |
0.2000 |
0.1750 |
0.3394 |
0.3586 |
0.3478 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
Only uses GPT-4o |
0.1679 |
0.1201 |
0.1987 |
0.1274 |
- |
- |
- |
- |
IR2 |
0.2228 |
0.1859 |
0.2283 |
0.1828 |
- |
- |
- |
- |
IR3 |
0.2511 |
0.2090 |
0.2591 |
0.2058 |
- |
- |
- |
- |
IR4 |
0.2385 |
0.2041 |
0.2469 |
0.2013 |
- |
- |
- |
- |
NN_Persona_3 |
0.2022 |
0.0829 |
0.2366 |
0.0876 |
- |
- |
- |
- |
NN_Baseline |
0.1740 |
0.0780 |
0.2101 |
0.0872 |
- |
- |
- |
- |
NN_Persona_1 |
0.1916 |
0.0891 |
0.2224 |
0.0946 |
- |
- |
- |
- |
NN_Persona_2 |
0.1898 |
0.0944 |
0.2195 |
0.0996 |
- |
- |
- |
- |
IR1 |
0.1674 |
0.1562 |
0.1789 |
0.1553 |
- |
- |
- |
- |
UR-IW-2 |
0.1195 |
0.0842 |
0.1514 |
0.0959 |
- |
- |
- |
- |
UR-IW-4 |
0.1115 |
0.0768 |
0.1418 |
0.0903 |
- |
- |
- |
- |
lasigeBioTM |
- |
- |
- |
- |
- |
- |
- |
- |
lasigeBioTM-onto-sm |
- |
- |
- |
- |
- |
- |
- |
- |
Fleming-1 |
0.2421 |
0.0771 |
0.2794 |
0.0865 |
- |
- |
- |
- |
Baseline top 10 |
0.0401 |
0.0379 |
0.0425 |
0.0406 |
- |
- |
- |
- |
Using LLM alone |
0.0260 |
0.0257 |
0.0288 |
0.0285 |
- |
- |
- |
- |
Using KG for list q |
0.0453 |
0.0280 |
0.0519 |
0.0312 |
- |
- |
- |
- |
Main pipeline |
0.0453 |
0.0280 |
0.0519 |
0.0312 |
- |
- |
- |
- |
Baseline top 20 |
0.0408 |
0.0397 |
0.0466 |
0.0450 |
- |
- |
- |
- |
UniTor_0 |
0.1799 |
0.1901 |
0.1860 |
0.1944 |
- |
- |
- |
- |
UniTor_1 |
0.1891 |
0.1980 |
0.1948 |
0.2026 |
- |
- |
- |
- |
UniTor_2 |
0.1857 |
0.2011 |
0.1947 |
0.2085 |
- |
- |
- |
- |
UniTor_3 |
0.1740 |
0.1908 |
0.1841 |
0.1970 |
- |
- |
- |
- |
IR5 |
0.1481 |
0.1376 |
0.1629 |
0.1424 |
- |
- |
- |
- |
bious1 |
0.2062 |
0.1547 |
0.2188 |
0.1544 |
- |
- |
- |
- |
bious2 |
0.1858 |
0.1474 |
0.2030 |
0.1509 |
- |
- |
- |
- |
bious3 |
0.1871 |
0.1432 |
0.2029 |
0.1469 |
- |
- |
- |
- |
bious4 |
0.1812 |
0.1440 |
0.1943 |
0.1465 |
- |
- |
- |
- |
bious5 |
0.1832 |
0.1403 |
0.2042 |
0.1479 |
- |
- |
- |
- |
UR-IW-1 |
0.2172 |
0.1152 |
0.2557 |
0.1229 |
- |
- |
- |
- |
UR-IW-3 |
0.2373 |
0.1323 |
0.2595 |
0.1353 |
- |
- |
- |
- |
UR-IW-5 |
0.2236 |
0.1722 |
0.2406 |
0.1740 |
- |
- |
- |
- |
dmiip2024 |
0.2118 |
0.2010 |
0.2165 |
0.2014 |
- |
- |
- |
- |
dmiip2024_1 |
0.2006 |
0.1874 |
0.2059 |
0.1871 |
- |
- |
- |
- |
dmiip2024_2 |
0.1982 |
0.1878 |
0.2173 |
0.1991 |
- |
- |
- |
- |
dmiip2024_3 |
0.1594 |
0.1606 |
0.1686 |
0.1640 |
- |
- |
- |
- |
dmiip2024_4 |
0.2067 |
0.1947 |
0.2093 |
0.1923 |
- |
- |
- |
- |
lasigeBioTM-onto-bl |
- |
- |
- |
- |
- |
- |
- |
- |
IRIS_1 |
0.1478 |
0.1333 |
0.1668 |
0.1398 |
- |
- |
- |
- |
IRIS_2 |
0.1497 |
0.1180 |
0.1626 |
0.1201 |
- |
- |
- |
- |
IRIS_3 |
0.1468 |
0.1047 |
0.1607 |
0.1105 |
- |
- |
- |
- |
extractive |
0.0424 |
0.0236 |
0.0501 |
0.0262 |
- |
- |
- |
- |
mistral |
0.1835 |
0.1146 |
0.2033 |
0.1174 |
- |
- |
- |
- |
llama |
0.1768 |
0.1143 |
0.1929 |
0.1137 |
- |
- |
- |
- |
abstractive |
0.0434 |
0.0151 |
0.0528 |
0.0179 |
- |
- |
- |
- |
bioinfo-0 |
0.1834 |
0.1318 |
0.1923 |
0.1313 |
- |
- |
- |
- |
bioinfo-1 |
0.2263 |
0.1428 |
0.2403 |
0.1428 |
- |
- |
- |
- |
bioinfo-2 |
0.1858 |
0.1045 |
0.2155 |
0.1115 |
- |
- |
- |
- |
bioinfo-3 |
0.1930 |
0.1076 |
0.2177 |
0.1125 |
- |
- |
- |
- |
bioinfo-4 |
0.1883 |
0.0917 |
0.2114 |
0.0945 |
- |
- |
- |
- |
dense |
0.1795 |
0.1073 |
0.2045 |
0.1134 |
- |
- |
- |
- |
sp_lasigebiotm |
0.0847 |
0.0726 |
0.0912 |
0.0722 |
- |
- |
- |
- |
GPT4O |
0.1058 |
0.0959 |
0.1263 |
0.1079 |
- |
- |
- |
- |
Fleming-2 |
0.2159 |
0.0927 |
0.2554 |
0.1009 |
- |
- |
- |
- |
deepseek32b-me |
0.1513 |
0.1160 |
0.1744 |
0.1267 |
- |
- |
- |
- |
deepseek-r1:14b |
0.0998 |
0.0906 |
0.1192 |
0.1030 |
- |
- |
- |
- |
deepseek32b-full |
0.1527 |
0.1160 |
0.1792 |
0.1307 |
- |
- |
- |
- |
AQAMS |
0.2135 |
0.0690 |
0.2516 |
0.0778 |
- |
- |
- |
- |
Test batch 4
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
IR1 |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.4091 |
0.4091 |
0.4091 |
0.2781 |
0.2450 |
0.2544 |
UniTor_0 |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.4545 |
0.4545 |
0.4545 |
0.2409 |
0.2687 |
0.2503 |
UniTor_1 |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.4545 |
0.4545 |
0.4545 |
0.2356 |
0.3020 |
0.2498 |
UniTor_2 |
0.8846 |
0.9231 |
0.7692 |
0.8462 |
0.4091 |
0.4091 |
0.4091 |
0.2231 |
0.2828 |
0.2343 |
UniTor_3 |
0.8846 |
0.9231 |
0.7692 |
0.8462 |
0.4091 |
0.4091 |
0.4091 |
0.2433 |
0.2823 |
0.2532 |
simple truncation |
0.8846 |
0.9231 |
0.7692 |
0.8462 |
0.2273 |
0.4091 |
0.2917 |
0.1347 |
0.2039 |
0.1552 |
Using KG for list q |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.3636 |
0.4545 |
0.3902 |
0.1579 |
0.3421 |
0.1987 |
Baseline top 20 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.4091 |
0.4545 |
0.4318 |
0.2991 |
0.3232 |
0.2977 |
Baseline top 10 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.4545 |
0.5455 |
0.5000 |
0.2764 |
0.3272 |
0.2904 |
Using LLM alone |
0.8846 |
0.9143 |
0.8235 |
0.8689 |
0.4091 |
0.4091 |
0.4091 |
0.2601 |
0.3712 |
0.2959 |
Main pipeline |
0.9231 |
0.9474 |
0.8571 |
0.9023 |
0.3636 |
0.4545 |
0.3902 |
0.1912 |
0.3656 |
0.2286 |
IR5 |
0.6923 |
0.7647 |
0.5556 |
0.6601 |
0.5000 |
0.5000 |
0.5000 |
0.2303 |
0.2187 |
0.2196 |
IR3 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.4091 |
0.4545 |
0.4318 |
0.3071 |
0.2890 |
0.2918 |
IR4 |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.3636 |
0.3636 |
0.3636 |
0.2716 |
0.2414 |
0.2492 |
IR2 |
0.8077 |
0.8571 |
0.7059 |
0.7815 |
0.5000 |
0.5000 |
0.5000 |
0.3272 |
0.2573 |
0.2845 |
AQAMS |
0.9231 |
0.9474 |
0.8571 |
0.9023 |
0.4091 |
0.4091 |
0.4091 |
0.2807 |
0.2897 |
0.2778 |
kmeans |
0.6538 |
0.7097 |
0.5714 |
0.6406 |
0.0455 |
0.0455 |
0.0455 |
0.2171 |
0.2385 |
0.2082 |
similarity measures |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.3636 |
0.5000 |
0.4318 |
0.1892 |
0.2732 |
0.1943 |
deepseek32b-me |
0.8077 |
0.8718 |
0.6154 |
0.7436 |
0.4545 |
0.4545 |
0.4545 |
0.3217 |
0.2929 |
0.3014 |
deepseek32b-full |
0.7692 |
0.8421 |
0.5714 |
0.7068 |
0.4091 |
0.4091 |
0.4091 |
0.2775 |
0.2673 |
0.2670 |
bious2 |
0.7692 |
0.8235 |
0.6667 |
0.7451 |
0.2273 |
0.3182 |
0.2727 |
0.1922 |
0.2025 |
0.1922 |
bious3 |
0.7692 |
0.8235 |
0.6667 |
0.7451 |
0.4091 |
0.4545 |
0.4318 |
0.2704 |
0.2411 |
0.2420 |
bious4 |
0.7308 |
0.8000 |
0.5882 |
0.6941 |
0.3182 |
0.3636 |
0.3409 |
0.2376 |
0.2503 |
0.2402 |
bious5 |
0.7692 |
0.8235 |
0.6667 |
0.7451 |
0.3182 |
0.3182 |
0.3182 |
0.2194 |
0.2086 |
0.2008 |
dmiip2024 |
0.8846 |
0.9231 |
0.7692 |
0.8462 |
0.5000 |
0.5455 |
0.5227 |
0.2141 |
0.2748 |
0.2352 |
dmiip2024_1 |
0.9231 |
0.9444 |
0.8750 |
0.9097 |
0.4545 |
0.4545 |
0.4545 |
0.2316 |
0.2704 |
0.2453 |
dmiip2024_2 |
0.8462 |
0.8947 |
0.7143 |
0.8045 |
0.2727 |
0.2727 |
0.2727 |
0.1927 |
0.3011 |
0.2281 |
dmiip2024_4 |
0.8077 |
0.8780 |
0.5455 |
0.7118 |
0.4545 |
0.5000 |
0.4773 |
0.2737 |
0.2576 |
0.2604 |
Fleming-2 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.2273 |
0.4091 |
0.2818 |
0.1202 |
0.3068 |
0.1578 |
dmiip2024_3 |
0.8077 |
0.8718 |
0.6154 |
0.7436 |
0.5000 |
0.5909 |
0.5303 |
0.3246 |
0.2801 |
0.2919 |
Fleming-1 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.2273 |
0.4091 |
0.2803 |
0.2150 |
0.3364 |
0.2425 |
UR-IW-1 |
0.8462 |
0.8947 |
0.7143 |
0.8045 |
0.5000 |
0.5455 |
0.5152 |
0.1819 |
0.3429 |
0.2246 |
UR-IW-2 |
0.9231 |
0.9444 |
0.8750 |
0.9097 |
0.4545 |
0.4545 |
0.4545 |
0.1846 |
0.3349 |
0.2172 |
UR-IW-3 |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.4545 |
0.4545 |
0.4545 |
0.2640 |
0.3114 |
0.2739 |
UR-IW-5 |
0.8462 |
0.9000 |
0.6667 |
0.7833 |
0.5455 |
0.5909 |
0.5606 |
0.2345 |
0.3680 |
0.2742 |
UR-IW-4 |
0.8077 |
0.8485 |
0.7368 |
0.7927 |
0.3636 |
0.4091 |
0.3788 |
0.2122 |
0.2936 |
0.2270 |
Fleming-3 |
0.7692 |
0.8421 |
0.5714 |
0.7068 |
0.2273 |
0.4091 |
0.2818 |
0.1202 |
0.3068 |
0.1578 |
extractive |
0.9231 |
0.9444 |
0.8750 |
0.9097 |
0.4091 |
0.4545 |
0.4318 |
0.1820 |
0.2838 |
0.1879 |
abstractive |
0.9231 |
0.9444 |
0.8750 |
0.9097 |
0.4091 |
0.4545 |
0.4318 |
0.1408 |
0.2707 |
0.1762 |
Only uses GPT-4o |
0.8462 |
0.8824 |
0.7778 |
0.8301 |
0.4545 |
0.4545 |
0.4545 |
0.1954 |
0.2071 |
0.1933 |
NN_Persona_1 |
0.7692 |
0.8125 |
0.7000 |
0.7563 |
0.3182 |
0.4091 |
0.3636 |
0.2021 |
0.2824 |
0.2210 |
NN_Persona_2 |
0.8077 |
0.8485 |
0.7368 |
0.7927 |
0.3182 |
0.3636 |
0.3409 |
0.1299 |
0.1875 |
0.1426 |
NN_Persona_3 |
0.8077 |
0.8571 |
0.7059 |
0.7815 |
0.4091 |
0.5000 |
0.4295 |
0.1680 |
0.2399 |
0.1894 |
NN_Baseline |
0.8462 |
0.8824 |
0.7778 |
0.8301 |
0.3636 |
0.3636 |
0.3636 |
0.2660 |
0.2845 |
0.2644 |
GPT4O |
0.8077 |
0.8649 |
0.6667 |
0.7658 |
0.2273 |
0.3636 |
0.2955 |
0.2207 |
0.2639 |
0.2266 |
deepseek-r1:32b |
0.9231 |
0.9474 |
0.8571 |
0.9023 |
0.2727 |
0.3636 |
0.3182 |
0.2395 |
0.2480 |
0.2264 |
deepseek-r1:14b |
0.9231 |
0.9474 |
0.8571 |
0.9023 |
0.3182 |
0.3182 |
0.3182 |
0.2476 |
0.2401 |
0.2420 |
deepseek-r1:8b |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.3182 |
0.3182 |
0.3182 |
0.2566 |
0.2431 |
0.2458 |
Fleming-4 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.2273 |
0.4091 |
0.2780 |
0.2158 |
0.3364 |
0.2433 |
mistral |
0.8462 |
0.8947 |
0.7143 |
0.8045 |
0.4545 |
0.4545 |
0.4545 |
0.2015 |
0.3501 |
0.2464 |
llama |
0.8846 |
0.9231 |
0.7692 |
0.8462 |
0.4091 |
0.4091 |
0.4091 |
0.2439 |
0.3478 |
0.2742 |
sp_lasigebiotm |
0.8077 |
0.8485 |
0.7368 |
0.7927 |
0.3182 |
0.3182 |
0.3182 |
0.1947 |
0.1076 |
0.1347 |
lasigeBioTM |
0.8462 |
0.8824 |
0.7778 |
0.8301 |
0.3636 |
0.3636 |
0.3636 |
0.1482 |
0.1396 |
0.1363 |
lasigeBioTM-onto-bl |
0.8077 |
0.8571 |
0.7059 |
0.7815 |
0.3636 |
0.3636 |
0.3636 |
0.1518 |
0.1265 |
0.1310 |
lasigeBioTM-onto-sm |
0.7692 |
0.8125 |
0.7000 |
0.7563 |
0.2273 |
0.2273 |
0.2273 |
0.1228 |
0.0782 |
0.0892 |
bioinfo-0 |
0.7308 |
0.8444 |
- |
0.4222 |
- | - | - |
- | - | - |
bioinfo-1 |
0.7308 |
0.8444 |
- |
0.4222 |
- | - | - |
- | - | - |
deepseek32b-f |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.3636 |
0.3636 |
0.3636 |
0.2095 |
0.2543 |
0.2165 |
bioinfo-2 |
0.7308 |
0.8444 |
- |
0.4222 |
- | - | - |
- | - | - |
bioinfo-3 |
0.7308 |
0.8444 |
- |
0.4222 |
- | - | - |
- | - | - |
bioinfo-4 |
0.7308 |
0.8444 |
- |
0.4222 |
- | - | - |
- | - | - |
bious1 |
0.8077 |
0.8485 |
0.7368 |
0.7927 |
0.4091 |
0.4091 |
0.4091 |
0.2036 |
0.2499 |
0.2150 |
Fleming-5 |
0.9231 |
0.9474 |
0.8571 |
0.9023 |
0.2273 |
0.4091 |
0.2780 |
0.2158 |
0.3364 |
0.2433 |
phaseB-5 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.3636 |
0.3636 |
0.3636 |
0.2263 |
0.2244 |
0.2210 |
gpt 01 mini |
0.8077 |
0.8649 |
0.6667 |
0.7658 |
0.2273 |
0.2273 |
0.2273 |
0.2178 |
0.2476 |
0.2175 |
phaseB-4 |
0.9231 |
0.9474 |
0.8571 |
0.9023 |
0.3636 |
0.3636 |
0.3636 |
0.2549 |
0.2911 |
0.2652 |
3.PhaseB_System |
0.7308 |
0.8444 |
- |
0.4222 |
0.1364 |
0.1364 |
0.1364 |
- | - | - |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
IR1 |
0.1432 |
0.1286 |
0.1491 |
0.1332 |
- |
- |
- |
- |
UniTor_0 |
0.1316 |
0.1356 |
0.1335 |
0.1412 |
- |
- |
- |
- |
UniTor_1 |
0.1346 |
0.1369 |
0.1327 |
0.1392 |
- |
- |
- |
- |
UniTor_2 |
0.1498 |
0.1503 |
0.1544 |
0.1576 |
- |
- |
- |
- |
UniTor_3 |
0.1527 |
0.1526 |
0.1582 |
0.1594 |
- |
- |
- |
- |
simple truncation |
0.0342 |
0.0208 |
0.0411 |
0.0245 |
- |
- |
- |
- |
Using KG for list q |
0.0262 |
0.0198 |
0.0327 |
0.0246 |
- |
- |
- |
- |
Baseline top 20 |
0.0145 |
0.0166 |
0.0204 |
0.0233 |
- |
- |
- |
- |
Baseline top 10 |
0.0180 |
0.0202 |
0.0230 |
0.0254 |
- |
- |
- |
- |
Using LLM alone |
0.0123 |
0.0138 |
0.0162 |
0.0192 |
- |
- |
- |
- |
Main pipeline |
0.0262 |
0.0198 |
0.0327 |
0.0246 |
- |
- |
- |
- |
IR5 |
0.1414 |
0.1209 |
0.1414 |
0.1202 |
- |
- |
- |
- |
IR3 |
0.1843 |
0.1492 |
0.1956 |
0.1553 |
- |
- |
- |
- |
IR4 |
0.1925 |
0.1598 |
0.2060 |
0.1665 |
- |
- |
- |
- |
IR2 |
0.1557 |
0.1403 |
0.1670 |
0.1467 |
- |
- |
- |
- |
AQAMS |
0.1840 |
0.0616 |
0.2273 |
0.0763 |
- |
- |
- |
- |
kmeans |
- |
- |
- |
- |
- |
- |
- |
- |
similarity measures |
0.0296 |
0.0183 |
0.0375 |
0.0221 |
- |
- |
- |
- |
deepseek32b-me |
0.1238 |
0.1359 |
0.1268 |
0.1393 |
- |
- |
- |
- |
deepseek32b-full |
0.1121 |
0.1291 |
0.1142 |
0.1336 |
- |
- |
- |
- |
bious2 |
0.1388 |
0.1183 |
0.1490 |
0.1255 |
- |
- |
- |
- |
bious3 |
0.1433 |
0.1181 |
0.1538 |
0.1250 |
- |
- |
- |
- |
bious4 |
0.1499 |
0.1247 |
0.1607 |
0.1313 |
- |
- |
- |
- |
bious5 |
0.1472 |
0.1217 |
0.1606 |
0.1280 |
- |
- |
- |
- |
dmiip2024 |
0.1411 |
0.1268 |
0.1506 |
0.1338 |
- |
- |
- |
- |
dmiip2024_1 |
0.1447 |
0.1274 |
0.1582 |
0.1370 |
- |
- |
- |
- |
dmiip2024_2 |
0.1444 |
0.1483 |
0.1512 |
0.1522 |
- |
- |
- |
- |
dmiip2024_4 |
0.1500 |
0.1581 |
0.1530 |
0.1608 |
- |
- |
- |
- |
Fleming-2 |
0.1589 |
0.0501 |
0.2014 |
0.0617 |
- |
- |
- |
- |
dmiip2024_3 |
0.1433 |
0.1523 |
0.1411 |
0.1514 |
- |
- |
- |
- |
Fleming-1 |
0.2059 |
0.0639 |
0.2368 |
0.0737 |
- |
- |
- |
- |
UR-IW-1 |
0.1914 |
0.0895 |
0.2267 |
0.1015 |
- |
- |
- |
- |
UR-IW-2 |
0.1578 |
0.0917 |
0.1845 |
0.1014 |
- |
- |
- |
- |
UR-IW-3 |
0.1862 |
0.1000 |
0.2135 |
0.1103 |
- |
- |
- |
- |
UR-IW-5 |
0.1852 |
0.1125 |
0.2105 |
0.1231 |
- |
- |
- |
- |
UR-IW-4 |
0.1363 |
0.0817 |
0.1620 |
0.0910 |
- |
- |
- |
- |
Fleming-3 |
0.1615 |
0.0612 |
0.2059 |
0.0761 |
- |
- |
- |
- |
extractive |
0.0288 |
0.0176 |
0.0372 |
0.0222 |
- |
- |
- |
- |
abstractive |
0.0309 |
0.0188 |
0.0388 |
0.0230 |
- |
- |
- |
- |
Only uses GPT-4o |
0.1921 |
0.0749 |
0.2279 |
0.0897 |
- |
- |
- |
- |
NN_Persona_1 |
0.1988 |
0.0581 |
0.2414 |
0.0698 |
- |
- |
- |
- |
NN_Persona_2 |
0.1937 |
0.0681 |
0.2332 |
0.0794 |
- |
- |
- |
- |
NN_Persona_3 |
0.2032 |
0.0581 |
0.2423 |
0.0702 |
- |
- |
- |
- |
NN_Baseline |
0.1572 |
0.0627 |
0.1937 |
0.0775 |
- |
- |
- |
- |
GPT4O |
0.0999 |
0.0945 |
0.1179 |
0.1101 |
- |
- |
- |
- |
deepseek-r1:32b |
0.0844 |
0.0790 |
0.1042 |
0.0981 |
- |
- |
- |
- |
deepseek-r1:14b |
0.1874 |
0.0937 |
0.2177 |
0.1073 |
- |
- |
- |
- |
deepseek-r1:8b |
0.1882 |
0.0944 |
0.2182 |
0.1081 |
- |
- |
- |
- |
Fleming-4 |
0.1609 |
0.0572 |
0.1934 |
0.0693 |
- |
- |
- |
- |
mistral |
0.1911 |
0.0922 |
0.2079 |
0.0965 |
- |
- |
- |
- |
llama |
0.1371 |
0.0799 |
0.1602 |
0.0887 |
- |
- |
- |
- |
sp_lasigebiotm |
0.1565 |
0.0870 |
0.1861 |
0.1019 |
- |
- |
- |
- |
lasigeBioTM |
0.1935 |
0.0867 |
0.2249 |
0.0988 |
- |
- |
- |
- |
lasigeBioTM-onto-bl |
0.1761 |
0.0848 |
0.2185 |
0.1005 |
- |
- |
- |
- |
lasigeBioTM-onto-sm |
0.1041 |
0.0808 |
0.1141 |
0.0853 |
- |
- |
- |
- |
bioinfo-0 |
0.1183 |
0.0880 |
0.1426 |
0.0983 |
- |
- |
- |
- |
bioinfo-1 |
0.1761 |
0.1691 |
0.1801 |
0.1726 |
- |
- |
- |
- |
deepseek32b-f |
0.1619 |
0.0858 |
0.1750 |
0.0851 |
- |
- |
- |
- |
bioinfo-2 |
0.1472 |
0.0697 |
0.1767 |
0.0832 |
- |
- |
- |
- |
bioinfo-3 |
0.1704 |
0.1378 |
0.1843 |
0.1468 |
- |
- |
- |
- |
bioinfo-4 |
0.1532 |
0.0720 |
0.1789 |
0.0842 |
- |
- |
- |
- |
bious1 |
0.1570 |
0.1238 |
0.1678 |
0.1266 |
- |
- |
- |
- |
Fleming-5 |
0.1609 |
0.0572 |
0.1934 |
0.0693 |
- |
- |
- |
- |
phaseB-5 |
0.1357 |
0.0784 |
0.1419 |
0.0808 |
- |
- |
- |
- |
gpt 01 mini |
0.1012 |
0.0818 |
0.1150 |
0.0931 |
- |
- |
- |
- |
phaseB-4 |
0.1624 |
0.0855 |
0.1767 |
0.0892 |
- |
- |
- |
- |
3.PhaseB_System |
0.0826 |
0.0901 |
0.0920 |
0.0986 |
- |
- |
- |
- |