BioASQ Participants Area
Task 13b: Test Results of Phase A+
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task A+ are presented
here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
UniTor_0 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.3077 |
0.3077 |
0.2152 |
0.2031 |
0.2039 |
UniTor_1 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.2692 |
0.2692 |
0.2692 |
0.2506 |
0.3323 |
0.2563 |
Only uses GPT-4o |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.1538 |
0.2308 |
0.1859 |
0.1596 |
0.1424 |
0.1415 |
NN_Persona_2 |
0.8824 |
0.9231 |
0.7500 |
0.8365 |
0.2308 |
0.3077 |
0.2692 |
0.1880 |
0.2018 |
0.1821 |
Baseline top 20 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3462 |
0.3846 |
0.3654 |
0.3189 |
0.2663 |
0.2782 |
Using LLM alone |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1538 |
0.2308 |
0.1827 |
0.2530 |
0.2724 |
0.2496 |
Baseline top 10 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3077 |
0.3462 |
0.3269 |
0.3249 |
0.3374 |
0.3038 |
Using KG for list q |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3077 |
0.4615 |
0.3782 |
0.1249 |
0.3496 |
0.1645 |
Main pipeline |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3077 |
0.4615 |
0.3782 |
0.1485 |
0.2791 |
0.1698 |
bioinfo-0 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
bioinfo-1 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
bioinfo-2 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
bioinfo-3 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
bioinfo-4 |
0.7059 |
0.8276 |
- |
0.4138 |
- | - | - |
- | - | - |
UniTor_2 |
0.8824 |
0.9167 |
0.8000 |
0.8583 |
0.3077 |
0.3462 |
0.3269 |
0.1884 |
0.2449 |
0.1848 |
UniTor_3 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.3462 |
0.3269 |
0.1851 |
0.2414 |
0.1807 |
NN_Persona_1 |
0.8824 |
0.9231 |
0.7500 |
0.8365 |
0.2692 |
0.3462 |
0.2962 |
0.1558 |
0.1972 |
0.1491 |
NN_Persona_3 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3077 |
0.3846 |
0.3365 |
0.1622 |
0.1915 |
0.1562 |
deepseek32b-me |
0.2941 |
- |
0.4545 |
0.2273 |
0.0769 |
0.0769 |
0.0769 |
0.2360 |
0.1503 |
0.1708 |
GPT4O |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.1154 |
0.1154 |
0.1154 |
0.2008 |
0.1301 |
0.1484 |
Fleming-1 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.4615 |
0.3571 |
0.2363 |
0.1666 |
0.1799 |
Fleming-2 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.4615 |
0.3571 |
0.2029 |
0.2029 |
0.1760 |
google_serach_&_LLM |
0.7059 |
0.7368 |
0.6667 |
0.7018 |
0.1154 |
0.1154 |
0.1154 |
0.1706 |
0.0838 |
0.1074 |
DB_vector_&_LLM |
0.4706 |
0.4000 |
0.5263 |
0.4632 |
0.1154 |
0.1154 |
0.1154 |
0.1359 |
0.0727 |
0.0930 |
IRIS_1 |
0.2941 |
- |
0.4545 |
0.2273 |
- | - | - |
- | - | - |
IRIS_2 |
0.2941 |
- |
0.4545 |
0.2273 |
- | - | - |
- | - | - |
UR-IW-1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2692 |
0.3462 |
0.3077 |
0.2556 |
0.3513 |
0.2635 |
UR-IW-3 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.3846 |
0.3462 |
0.2535 |
0.3228 |
0.2633 |
UR-IW-5 |
0.8235 |
0.8696 |
0.7273 |
0.7984 |
0.3846 |
0.4231 |
0.4038 |
0.3151 |
0.4094 |
0.3223 |
bious2 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2308 |
0.2692 |
0.2500 |
0.1792 |
0.1646 |
0.1626 |
IRIS_3 |
0.2941 |
- |
0.4545 |
0.2273 |
- | - | - |
- | - | - |
bious3 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2308 |
0.2308 |
0.2308 |
0.1946 |
0.1475 |
0.1609 |
bious4 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1538 |
0.1923 |
0.1731 |
0.2037 |
0.1267 |
0.1509 |
bious5 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1154 |
0.1154 |
0.1154 |
0.2673 |
0.2161 |
0.2198 |
IR1 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.3462 |
0.3462 |
0.3462 |
0.2333 |
0.0836 |
0.1119 |
Fleming-3 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3077 |
0.4615 |
0.3571 |
0.2029 |
0.2029 |
0.1760 |
qa |
0.2941 |
- |
0.4545 |
0.2273 |
0.1154 |
0.1154 |
0.1154 |
0.1783 |
0.0982 |
0.1227 |
bious1 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1923 |
0.1923 |
0.1923 |
0.2497 |
0.2168 |
0.2233 |
deepseek-r1:32b |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.0769 |
0.0769 |
0.0769 |
0.2110 |
0.1369 |
0.1558 |
config-1 |
0.8824 |
0.9231 |
0.7500 |
0.8365 |
0.3077 |
0.3077 |
0.3077 |
0.2511 |
0.1532 |
0.1838 |
config-2 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.2692 |
0.3077 |
0.2885 |
0.3059 |
0.1899 |
0.2230 |
config-3 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.3462 |
0.3846 |
0.3590 |
0.3114 |
0.2520 |
0.2698 |
config-4 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.3462 |
0.3846 |
0.3654 |
0.3585 |
0.3141 |
0.3089 |
config-5 |
0.8824 |
0.9167 |
0.8000 |
0.8583 |
0.3846 |
0.4231 |
0.4038 |
0.3717 |
0.2932 |
0.3185 |
mistral |
0.8824 |
0.9167 |
0.8000 |
0.8583 |
0.2692 |
0.3077 |
0.2885 |
0.2555 |
0.2160 |
0.2196 |
UR-IW-2 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.3846 |
0.4615 |
0.4167 |
0.2944 |
0.3073 |
0.2907 |
UR-IW-4 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.2692 |
0.3462 |
0.3077 |
0.3283 |
0.3397 |
0.3004 |
dmiip2024 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3462 |
0.3462 |
0.3462 |
0.2826 |
0.2264 |
0.2379 |
dmiip2024_1 |
0.8235 |
0.8800 |
0.6667 |
0.7733 |
0.3462 |
0.3462 |
0.3462 |
0.2986 |
0.2220 |
0.2423 |
deepseek-r1:14b |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.1154 |
0.1154 |
0.1154 |
0.1913 |
0.1205 |
0.1383 |
dmiip2024_2 |
0.9412 |
0.9600 |
0.8889 |
0.9244 |
0.3846 |
0.4231 |
0.4038 |
0.2760 |
0.2231 |
0.2324 |
deepseek-r1:8b |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.0769 |
0.0769 |
0.0769 |
0.2747 |
0.1846 |
0.2086 |
gpt 01 mini |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.0769 |
0.0769 |
0.0769 |
0.2110 |
0.1369 |
0.1558 |
dmiip2024_3 |
0.9412 |
0.9565 |
0.9091 |
0.9328 |
0.4231 |
0.5000 |
0.4551 |
0.2288 |
0.3018 |
0.2379 |
deepseek32b-full |
0.2941 |
- |
0.4545 |
0.2273 |
0.0769 |
0.0769 |
0.0769 |
0.2234 |
0.1512 |
0.1680 |
lasigeBioTM |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.1154 |
0.1154 |
0.1154 |
0.1783 |
0.1792 |
0.1549 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
UniTor_0 |
0.1325 |
0.1666 |
0.1278 |
0.1598 |
4.06 |
3.69 |
4.00 |
4.24 |
UniTor_1 |
0.1327 |
0.1694 |
0.1270 |
0.1610 |
4.00 |
3.62 |
3.88 |
4.12 |
Only uses GPT-4o |
0.2104 |
0.1989 |
0.2151 |
0.1997 |
4.52 |
3.84 |
4.32 |
4.49 |
NN_Persona_2 |
0.2419 |
0.1643 |
0.2531 |
0.1665 |
4.13 |
4.38 |
3.96 |
4.19 |
Baseline top 20 |
0.0279 |
0.0355 |
0.0272 |
0.0349 |
0.87 |
0.78 |
0.86 |
0.88 |
Using LLM alone |
0.0178 |
0.0242 |
0.0194 |
0.0261 |
0.91 |
0.80 |
0.88 |
0.92 |
Baseline top 10 |
0.0262 |
0.0328 |
0.0253 |
0.0324 |
0.85 |
0.81 |
0.81 |
0.87 |
Using KG for list q |
0.0343 |
0.0320 |
0.0361 |
0.0337 |
0.93 |
0.91 |
0.89 |
0.94 |
Main pipeline |
0.0343 |
0.0320 |
0.0361 |
0.0337 |
0.93 |
0.91 |
0.89 |
0.94 |
bioinfo-0 |
0.1434 |
0.1667 |
0.1379 |
0.1587 |
4.18 |
3.95 |
4.12 |
4.28 |
bioinfo-1 |
0.2540 |
0.1727 |
0.2588 |
0.1714 |
4.24 |
4.26 |
4.02 |
4.36 |
bioinfo-2 |
0.2270 |
0.1803 |
0.2353 |
0.1831 |
4.36 |
4.39 |
4.16 |
4.51 |
bioinfo-3 |
0.2395 |
0.1818 |
0.2454 |
0.1842 |
4.38 |
4.40 |
4.05 |
4.44 |
bioinfo-4 |
0.2183 |
0.1638 |
0.2278 |
0.1702 |
4.36 |
4.46 |
4.06 |
4.42 |
UniTor_2 |
0.1329 |
0.1590 |
0.1313 |
0.1535 |
4.00 |
3.62 |
3.92 |
4.20 |
UniTor_3 |
0.1319 |
0.1620 |
0.1289 |
0.1552 |
4.15 |
3.73 |
4.02 |
4.27 |
NN_Persona_1 |
0.2576 |
0.1807 |
0.2670 |
0.1828 |
4.25 |
4.45 |
4.08 |
4.27 |
NN_Persona_3 |
0.2405 |
0.1701 |
0.2492 |
0.1722 |
4.28 |
4.39 |
4.01 |
4.32 |
deepseek32b-me |
0.1820 |
0.1576 |
0.1850 |
0.1580 |
4.06 |
3.99 |
4.02 |
4.32 |
GPT4O |
0.2319 |
0.1582 |
0.2413 |
0.1636 |
4.53 |
3.86 |
4.15 |
4.52 |
Fleming-1 |
0.2025 |
0.1576 |
0.2103 |
0.1609 |
4.21 |
3.89 |
3.89 |
4.38 |
Fleming-2 |
0.2025 |
0.1576 |
0.2103 |
0.1609 |
4.21 |
3.89 |
3.89 |
4.38 |
google_serach_&_LLM |
0.1553 |
0.0988 |
0.1724 |
0.1092 |
3.65 |
3.06 |
3.31 |
3.76 |
DB_vector_&_LLM |
0.1351 |
0.0813 |
0.1556 |
0.0939 |
3.74 |
2.87 |
3.13 |
3.86 |
IRIS_1 |
0.1466 |
0.1614 |
0.1479 |
0.1617 |
4.14 |
3.59 |
3.98 |
4.29 |
IRIS_2 |
0.1476 |
0.1520 |
0.1474 |
0.1521 |
3.81 |
3.55 |
3.82 |
4.12 |
UR-IW-1 |
0.2444 |
0.1830 |
0.2541 |
0.1877 |
4.34 |
4.15 |
4.01 |
4.34 |
UR-IW-3 |
0.2308 |
0.1848 |
0.2380 |
0.1876 |
4.24 |
4.05 |
4.00 |
4.29 |
UR-IW-5 |
0.2140 |
0.1830 |
0.2252 |
0.1897 |
4.29 |
4.11 |
3.89 |
4.39 |
bious2 |
0.1531 |
0.1713 |
0.1529 |
0.1699 |
4.20 |
3.54 |
3.81 |
4.34 |
IRIS_3 |
0.1428 |
0.1440 |
0.1455 |
0.1461 |
3.72 |
3.48 |
3.81 |
4.20 |
bious3 |
0.1607 |
0.1812 |
0.1598 |
0.1794 |
4.09 |
3.72 |
3.87 |
4.18 |
bious4 |
0.1543 |
0.1729 |
0.1526 |
0.1684 |
4.21 |
3.58 |
3.87 |
4.25 |
bious5 |
0.1574 |
0.1783 |
0.1547 |
0.1746 |
4.27 |
3.65 |
3.82 |
4.29 |
IR1 |
0.1359 |
0.1666 |
0.1304 |
0.1612 |
3.95 |
3.36 |
3.85 |
4.20 |
Fleming-3 |
0.2345 |
0.1164 |
0.2535 |
0.1252 |
4.11 |
3.91 |
3.79 |
4.25 |
qa |
0.1524 |
0.1510 |
0.1555 |
0.1525 |
3.94 |
3.76 |
3.95 |
4.31 |
bious1 |
0.1777 |
0.1894 |
0.1735 |
0.1832 |
4.22 |
3.81 |
4.09 |
4.35 |
deepseek-r1:32b |
0.2066 |
0.1586 |
0.2197 |
0.1670 |
4.47 |
3.87 |
4.13 |
4.49 |
config-1 |
0.2287 |
0.1927 |
0.2330 |
0.1951 |
4.48 |
4.26 |
4.19 |
4.55 |
config-2 |
0.2255 |
0.1873 |
0.2323 |
0.1909 |
4.29 |
4.09 |
4.09 |
4.41 |
config-3 |
0.2284 |
0.1921 |
0.2350 |
0.1964 |
4.31 |
4.06 |
4.05 |
4.36 |
config-4 |
0.2524 |
0.2067 |
0.2579 |
0.2091 |
4.41 |
4.21 |
4.18 |
4.40 |
config-5 |
0.2418 |
0.2014 |
0.2475 |
0.2038 |
4.33 |
4.18 |
4.16 |
4.44 |
mistral |
0.1793 |
0.1833 |
0.1809 |
0.1824 |
4.31 |
3.87 |
3.96 |
4.39 |
UR-IW-2 |
0.1843 |
0.1574 |
0.1998 |
0.1672 |
4.39 |
4.13 |
4.13 |
4.45 |
UR-IW-4 |
0.1937 |
0.1496 |
0.2080 |
0.1583 |
4.49 |
4.15 |
4.20 |
4.48 |
dmiip2024 |
0.2312 |
0.2062 |
0.2344 |
0.2072 |
4.33 |
4.19 |
4.16 |
4.38 |
dmiip2024_1 |
0.2311 |
0.2171 |
0.2318 |
0.2164 |
4.32 |
4.13 |
4.14 |
4.28 |
deepseek-r1:14b |
0.2075 |
0.1646 |
0.2185 |
0.1715 |
4.46 |
3.80 |
4.06 |
4.53 |
dmiip2024_2 |
0.1424 |
0.1883 |
0.1332 |
0.1774 |
4.29 |
4.00 |
4.20 |
4.41 |
deepseek-r1:8b |
0.2262 |
0.1526 |
0.2396 |
0.1598 |
4.53 |
3.92 |
4.16 |
4.54 |
gpt 01 mini |
0.2066 |
0.1586 |
0.2197 |
0.1670 |
4.47 |
3.87 |
4.13 |
4.49 |
dmiip2024_3 |
0.1575 |
0.2016 |
0.1490 |
0.1920 |
4.04 |
3.71 |
4.04 |
4.18 |
deepseek32b-full |
0.1736 |
0.1593 |
0.1763 |
0.1583 |
3.93 |
4.00 |
4.00 |
4.22 |
lasigeBioTM |
0.1447 |
0.1455 |
0.1448 |
0.1450 |
- |
- |
- |
- |
Test batch 2
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
Only uses GPT-4o |
0.6471 |
0.6250 |
0.6667 |
0.6458 |
0.2963 |
0.2963 |
0.2963 |
0.2326 |
0.1944 |
0.2026 |
NN_Persona_1 |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.2222 |
0.2963 |
0.2593 |
0.1805 |
0.3159 |
0.2098 |
NN_Persona_2 |
0.8235 |
0.8421 |
0.8000 |
0.8211 |
0.3333 |
0.4444 |
0.3796 |
0.2644 |
0.3590 |
0.2842 |
NN_Persona_3 |
0.7647 |
0.8000 |
0.7143 |
0.7571 |
0.2963 |
0.4074 |
0.3519 |
0.2252 |
0.3095 |
0.2477 |
Fleming-1 |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.3333 |
0.4815 |
0.4012 |
0.2105 |
0.2757 |
0.2110 |
UniTor_0 |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.4815 |
0.4815 |
0.4815 |
0.3282 |
0.4626 |
0.3504 |
UniTor_1 |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.4815 |
0.4815 |
0.4815 |
0.3282 |
0.4626 |
0.3504 |
UniTor_2 |
0.7647 |
0.8000 |
0.7143 |
0.7571 |
0.4074 |
0.4074 |
0.4074 |
0.3105 |
0.4043 |
0.3062 |
UniTor_3 |
0.7647 |
0.8000 |
0.7143 |
0.7571 |
0.4074 |
0.4074 |
0.4074 |
0.3105 |
0.4043 |
0.3062 |
Baseline top 20 |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.4815 |
0.5556 |
0.5185 |
0.4171 |
0.3688 |
0.3647 |
Main pipeline |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.3704 |
0.4444 |
0.4012 |
0.2299 |
0.5347 |
0.2829 |
Using LLM alone |
0.5882 |
0.5882 |
0.5882 |
0.5882 |
0.3333 |
0.3333 |
0.3333 |
0.2627 |
0.3236 |
0.2828 |
Baseline top 10 |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.4074 |
0.5185 |
0.4506 |
0.4065 |
0.3911 |
0.3758 |
Using KG for list q |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.3704 |
0.4444 |
0.4012 |
0.2341 |
0.5151 |
0.2796 |
UR-IW-1 |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.4074 |
0.5185 |
0.4506 |
0.2571 |
0.3234 |
0.2671 |
UR-IW-2 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.4815 |
0.5185 |
0.5000 |
0.3930 |
0.4097 |
0.3830 |
UR-IW-3 |
0.9412 |
0.9474 |
0.9333 |
0.9404 |
0.4074 |
0.5556 |
0.4599 |
0.2031 |
0.2961 |
0.2283 |
UR-IW-4 |
0.6471 |
0.6667 |
0.6250 |
0.6458 |
0.5185 |
0.5556 |
0.5370 |
0.3442 |
0.3496 |
0.3283 |
NN_Baseline |
0.7059 |
0.7368 |
0.6667 |
0.7018 |
0.3333 |
0.3704 |
0.3519 |
0.2710 |
0.3136 |
0.2724 |
bioinfo-1 |
0.5882 |
0.7407 |
- |
0.3704 |
- | - | - |
- | - | - |
bioinfo-2 |
0.5882 |
0.7407 |
- |
0.3704 |
- | - | - |
- | - | - |
bioinfo-3 |
0.5882 |
0.7407 |
- |
0.3704 |
- | - | - |
- | - | - |
bioinfo-4 |
0.5882 |
0.7407 |
- |
0.3704 |
- | - | - |
- | - | - |
bious3 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.3704 |
0.4444 |
0.4074 |
0.1900 |
0.1550 |
0.1658 |
Fleming-2 |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.3333 |
0.4815 |
0.4012 |
0.2105 |
0.2757 |
0.2110 |
UR-IW-5 |
0.9412 |
0.9524 |
0.9231 |
0.9377 |
0.3333 |
0.3704 |
0.3457 |
0.2139 |
0.3298 |
0.2406 |
lasigeBioTM |
0.4118 |
0.4444 |
0.3750 |
0.4097 |
0.2593 |
0.2593 |
0.2593 |
0.4211 |
0.0865 |
0.1408 |
dmiip2024 |
0.8235 |
0.8421 |
0.8000 |
0.8211 |
0.5185 |
0.5926 |
0.5556 |
0.2952 |
0.3708 |
0.3037 |
dmiip2024_2 |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.4815 |
0.4815 |
0.4815 |
0.2316 |
0.3712 |
0.2686 |
dmiip2024_3 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.4074 |
0.4444 |
0.4259 |
0.4233 |
0.3458 |
0.3698 |
dmiip2024_4 |
0.7647 |
0.8182 |
0.6667 |
0.7424 |
0.6296 |
0.6296 |
0.6296 |
0.3147 |
0.3807 |
0.3282 |
dmiip2024_1 |
0.8235 |
0.8421 |
0.8000 |
0.8211 |
0.4815 |
0.4815 |
0.4815 |
0.3077 |
0.3554 |
0.3015 |
IR2 |
0.6471 |
0.6667 |
0.6250 |
0.6458 |
0.4815 |
0.4815 |
0.4815 |
0.2165 |
0.1892 |
0.1972 |
deepseek32b-me |
0.7647 |
0.8333 |
0.6000 |
0.7167 |
0.3704 |
0.3704 |
0.3704 |
0.2445 |
0.3056 |
0.2437 |
mistral |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.4444 |
0.5185 |
0.4722 |
0.2355 |
0.3071 |
0.2366 |
gpt 01 mini |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.0741 |
0.1111 |
0.0926 |
0.2453 |
0.2599 |
0.2280 |
bious4 |
0.8235 |
0.8696 |
0.7273 |
0.7984 |
0.4815 |
0.5185 |
0.5000 |
0.2532 |
0.2270 |
0.2341 |
phaseB-5 |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.3704 |
0.3704 |
0.3704 |
0.2526 |
0.2662 |
0.2443 |
phaseB-4 |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.3704 |
0.3704 |
0.3704 |
0.2526 |
0.2662 |
0.2443 |
deepseek32b-f |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.3333 |
0.3333 |
0.3333 |
0.2563 |
0.2749 |
0.2483 |
bious2 |
0.7647 |
0.7778 |
0.7500 |
0.7639 |
0.2963 |
0.3333 |
0.3148 |
0.2387 |
0.2358 |
0.2268 |
bious1 |
0.8824 |
0.9000 |
0.8571 |
0.8786 |
0.4074 |
0.4074 |
0.4074 |
0.2144 |
0.1862 |
0.1976 |
bious5 |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.3333 |
0.3704 |
0.3519 |
0.1978 |
0.2485 |
0.2023 |
deepseek-r1:14b |
0.8824 |
0.9091 |
0.8333 |
0.8712 |
0.1481 |
0.1481 |
0.1481 |
0.1754 |
0.1737 |
0.1700 |
deepseek-r1:8b |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.1111 |
0.1111 |
0.1111 |
0.3489 |
0.3167 |
0.3167 |
deepseek-r1:32b |
0.8235 |
0.8421 |
0.8000 |
0.8211 |
0.1111 |
0.1111 |
0.1111 |
0.3726 |
0.3495 |
0.3400 |
bioinfo-0 |
0.5882 |
0.7407 |
- |
0.3704 |
- | - | - |
- | - | - |
GPT4O |
0.8235 |
0.8571 |
0.7692 |
0.8132 |
0.1111 |
0.1111 |
0.1111 |
0.3489 |
0.3167 |
0.3167 |
deepseek32b-full |
0.8235 |
0.8696 |
0.7273 |
0.7984 |
0.3704 |
0.3704 |
0.3704 |
0.3382 |
0.3113 |
0.3067 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
Only uses GPT-4o |
0.2075 |
0.1990 |
0.2090 |
0.1963 |
4.56 |
3.60 |
4.25 |
4.66 |
NN_Persona_1 |
0.2614 |
0.1502 |
0.2735 |
0.1520 |
4.25 |
4.16 |
3.93 |
4.44 |
NN_Persona_2 |
0.2281 |
0.1508 |
0.2357 |
0.1524 |
4.19 |
4.21 |
4.05 |
4.29 |
NN_Persona_3 |
0.2308 |
0.1505 |
0.2407 |
0.1502 |
4.13 |
4.19 |
3.94 |
4.32 |
Fleming-1 |
0.2875 |
0.1248 |
0.3023 |
0.1300 |
4.00 |
3.92 |
3.81 |
4.29 |
UniTor_0 |
0.1890 |
0.2020 |
0.1780 |
0.1901 |
4.32 |
3.58 |
4.05 |
4.54 |
UniTor_1 |
0.1898 |
0.2038 |
0.1782 |
0.1912 |
4.32 |
3.60 |
4.05 |
4.54 |
UniTor_2 |
0.1949 |
0.2124 |
0.1840 |
0.2003 |
4.42 |
3.72 |
4.16 |
4.61 |
UniTor_3 |
0.1929 |
0.2106 |
0.1823 |
0.1988 |
4.42 |
3.73 |
4.15 |
4.61 |
Baseline top 20 |
0.0368 |
0.0400 |
0.0367 |
0.0405 |
1.15 |
1.08 |
1.14 |
1.20 |
Main pipeline |
0.0560 |
0.0469 |
0.0561 |
0.0468 |
1.20 |
1.21 |
1.13 |
1.20 |
Using LLM alone |
0.0359 |
0.0372 |
0.0354 |
0.0376 |
1.24 |
0.98 |
1.08 |
1.25 |
Baseline top 10 |
0.0351 |
0.0374 |
0.0355 |
0.0382 |
1.19 |
1.09 |
1.19 |
1.20 |
Using KG for list q |
0.0560 |
0.0469 |
0.0561 |
0.0468 |
1.20 |
1.21 |
1.13 |
1.20 |
UR-IW-1 |
0.2354 |
0.1680 |
0.2468 |
0.1721 |
4.38 |
3.95 |
4.00 |
4.46 |
UR-IW-2 |
0.1931 |
0.1675 |
0.1989 |
0.1674 |
4.49 |
4.01 |
4.24 |
4.60 |
UR-IW-3 |
0.2341 |
0.1882 |
0.2435 |
0.1910 |
4.38 |
4.11 |
4.11 |
4.56 |
UR-IW-4 |
0.1927 |
0.1603 |
0.2042 |
0.1629 |
4.49 |
4.00 |
4.26 |
4.69 |
NN_Baseline |
0.2205 |
0.1330 |
0.2375 |
0.1396 |
4.39 |
4.31 |
4.21 |
4.59 |
bioinfo-1 |
0.1965 |
0.1646 |
0.1953 |
0.1626 |
4.34 |
4.14 |
4.31 |
4.53 |
bioinfo-2 |
0.2328 |
0.1725 |
0.2357 |
0.1721 |
4.49 |
4.48 |
4.41 |
4.72 |
bioinfo-3 |
0.2167 |
0.1569 |
0.2216 |
0.1587 |
4.42 |
4.26 |
4.29 |
4.56 |
bioinfo-4 |
0.2277 |
0.1553 |
0.2288 |
0.1573 |
4.36 |
4.29 |
4.24 |
4.53 |
bious3 |
0.1881 |
0.1965 |
0.1877 |
0.1937 |
4.49 |
3.55 |
4.04 |
4.60 |
Fleming-2 |
0.2477 |
0.1563 |
0.2626 |
0.1613 |
4.31 |
4.00 |
4.02 |
4.52 |
UR-IW-5 |
0.1983 |
0.1660 |
0.2089 |
0.1698 |
4.32 |
3.69 |
3.98 |
4.39 |
lasigeBioTM |
0.1364 |
0.1374 |
0.1371 |
0.1370 |
3.40 |
3.42 |
3.80 |
4.24 |
dmiip2024 |
0.1926 |
0.2261 |
0.1868 |
0.2186 |
4.44 |
3.94 |
4.36 |
4.56 |
dmiip2024_2 |
0.1711 |
0.2074 |
0.1656 |
0.2003 |
4.29 |
3.76 |
4.20 |
4.55 |
dmiip2024_3 |
0.1616 |
0.1953 |
0.1513 |
0.1831 |
4.59 |
3.88 |
4.46 |
4.65 |
dmiip2024_4 |
0.1870 |
0.2233 |
0.1765 |
0.2109 |
4.45 |
3.96 |
4.34 |
4.59 |
dmiip2024_1 |
0.1960 |
0.2309 |
0.1915 |
0.2234 |
4.45 |
3.88 |
4.28 |
4.55 |
IR2 |
0.2211 |
0.2400 |
0.2132 |
0.2306 |
4.58 |
3.99 |
4.56 |
4.74 |
deepseek32b-me |
0.1915 |
0.1456 |
0.1964 |
0.1445 |
4.09 |
4.07 |
4.05 |
4.61 |
mistral |
0.2139 |
0.1679 |
0.2158 |
0.1677 |
4.33 |
3.94 |
3.91 |
4.49 |
gpt 01 mini |
0.1249 |
0.1235 |
0.1313 |
0.1295 |
3.85 |
2.93 |
3.61 |
4.36 |
bious4 |
0.2098 |
0.2139 |
0.2052 |
0.2068 |
4.54 |
4.05 |
4.32 |
4.64 |
phaseB-5 |
0.2056 |
0.1565 |
0.2126 |
0.1583 |
4.01 |
4.13 |
4.18 |
4.58 |
phaseB-4 |
0.2056 |
0.1565 |
0.2126 |
0.1583 |
4.01 |
4.13 |
4.18 |
4.58 |
deepseek32b-f |
0.2072 |
0.1504 |
0.2098 |
0.1491 |
3.94 |
4.24 |
4.00 |
4.60 |
bious2 |
0.1931 |
0.1972 |
0.1898 |
0.1919 |
4.44 |
3.81 |
4.21 |
4.60 |
bious1 |
0.2105 |
0.2121 |
0.2057 |
0.2051 |
4.45 |
4.05 |
4.32 |
4.64 |
bious5 |
0.2028 |
0.2107 |
0.1979 |
0.2040 |
4.49 |
3.85 |
4.21 |
4.55 |
deepseek-r1:14b |
0.1847 |
0.1243 |
0.1886 |
0.1277 |
4.18 |
3.60 |
3.76 |
4.49 |
deepseek-r1:8b |
0.2075 |
0.1418 |
0.2176 |
0.1461 |
4.41 |
3.68 |
4.14 |
4.56 |
deepseek-r1:32b |
0.2277 |
0.1548 |
0.2353 |
0.1579 |
4.46 |
3.72 |
4.14 |
4.62 |
bioinfo-0 |
0.2308 |
0.2272 |
0.2247 |
0.2162 |
4.46 |
4.20 |
4.39 |
4.56 |
GPT4O |
0.2075 |
0.1418 |
0.2176 |
0.1461 |
4.41 |
3.68 |
4.14 |
4.56 |
deepseek32b-full |
0.1804 |
0.1488 |
0.1771 |
0.1461 |
4.01 |
3.84 |
4.00 |
4.41 |
Test batch 3
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
Only uses GPT-4o |
0.8182 |
0.8667 |
0.7143 |
0.7905 |
0.3500 |
0.4000 |
0.3750 |
0.2644 |
0.2377 |
0.2346 |
IR2 |
0.9545 |
0.9677 |
0.9231 |
0.9454 |
0.2000 |
0.2000 |
0.2000 |
0.3280 |
0.3152 |
0.3209 |
IR3 |
0.8182 |
0.8824 |
0.6000 |
0.7412 |
0.3500 |
0.4500 |
0.4000 |
0.4556 |
0.4401 |
0.4265 |
IR4 |
0.7273 |
0.8235 |
0.4000 |
0.6118 |
0.3000 |
0.4500 |
0.3667 |
0.3754 |
0.3722 |
0.3645 |
NN_Persona_3 |
0.7727 |
0.8485 |
0.5455 |
0.6970 |
0.3500 |
0.4000 |
0.3750 |
0.2825 |
0.3983 |
0.3107 |
NN_Baseline |
0.7273 |
0.7692 |
0.6667 |
0.7179 |
0.1500 |
0.2500 |
0.2000 |
0.4149 |
0.4220 |
0.4110 |
NN_Persona_1 |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.2000 |
0.3000 |
0.2417 |
0.2532 |
0.3212 |
0.2744 |
NN_Persona_2 |
0.9545 |
0.9655 |
0.9333 |
0.9494 |
0.4000 |
0.4500 |
0.4250 |
0.2773 |
0.3974 |
0.3108 |
IR1 |
0.6818 |
0.7586 |
0.5333 |
0.6460 |
0.2000 |
0.2000 |
0.2000 |
0.3629 |
0.3136 |
0.3321 |
UR-IW-2 |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.2500 |
0.3500 |
0.3000 |
0.3902 |
0.3877 |
0.3685 |
UR-IW-4 |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.2500 |
0.3000 |
0.2750 |
0.2423 |
0.2712 |
0.2419 |
lasigeBioTM |
0.4091 |
0.3158 |
0.4800 |
0.3979 |
0.0500 |
0.0500 |
0.0500 |
0.0455 |
0.0076 |
0.0130 |
lasigeBioTM-onto-sm |
0.5909 |
0.6400 |
0.5263 |
0.5832 |
- | - | - |
0.0455 |
0.0227 |
0.0303 |
Fleming-1 |
0.7273 |
0.8000 |
0.5714 |
0.6857 |
0.2500 |
0.4000 |
0.3125 |
0.3293 |
0.4159 |
0.3592 |
Baseline top 10 |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.2000 |
0.3000 |
0.2500 |
0.4421 |
0.4219 |
0.4238 |
Using LLM alone |
0.7727 |
0.8387 |
0.6154 |
0.7270 |
0.3000 |
0.4000 |
0.3500 |
0.2159 |
0.2739 |
0.2344 |
Using KG for list q |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.3000 |
0.4500 |
0.3667 |
0.3279 |
0.4877 |
0.3642 |
Main pipeline |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.3000 |
0.4500 |
0.3667 |
0.3086 |
0.4506 |
0.3195 |
Baseline top 20 |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.2500 |
0.3000 |
0.2750 |
0.4381 |
0.4377 |
0.4259 |
UniTor_0 |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.3000 |
0.3500 |
0.3250 |
0.2702 |
0.2904 |
0.2735 |
UniTor_1 |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.2000 |
0.2000 |
0.2000 |
0.2550 |
0.2768 |
0.2626 |
UniTor_2 |
0.9091 |
0.9333 |
0.8571 |
0.8952 |
0.3000 |
0.3500 |
0.3167 |
0.2008 |
0.2273 |
0.2025 |
UniTor_3 |
0.9091 |
0.9333 |
0.8571 |
0.8952 |
0.2000 |
0.2500 |
0.2167 |
0.1950 |
0.2514 |
0.2119 |
IR5 |
0.7273 |
0.7692 |
0.6667 |
0.7179 |
0.1500 |
0.1500 |
0.1500 |
0.3221 |
0.2576 |
0.2722 |
bious1 |
0.9091 |
0.9333 |
0.8571 |
0.8952 |
0.2000 |
0.2500 |
0.2250 |
0.3766 |
0.3735 |
0.3719 |
bious2 |
0.8182 |
0.8571 |
0.7500 |
0.8036 |
0.2000 |
0.2000 |
0.2000 |
0.4206 |
0.3943 |
0.4032 |
bious3 |
0.9091 |
0.9333 |
0.8571 |
0.8952 |
0.1500 |
0.1500 |
0.1500 |
0.3490 |
0.3343 |
0.3358 |
bious4 |
0.8182 |
0.8571 |
0.7500 |
0.8036 |
0.1500 |
0.2000 |
0.1750 |
0.3940 |
0.3646 |
0.3710 |
bious5 |
0.9091 |
0.9333 |
0.8571 |
0.8952 |
0.2000 |
0.2500 |
0.2250 |
0.3553 |
0.3282 |
0.3356 |
UR-IW-1 |
0.8636 |
0.9091 |
0.7273 |
0.8182 |
0.2500 |
0.4500 |
0.3375 |
0.3495 |
0.4037 |
0.3587 |
UR-IW-3 |
0.6818 |
0.7586 |
0.5333 |
0.6460 |
0.2500 |
0.5500 |
0.3600 |
0.3400 |
0.3630 |
0.3337 |
UR-IW-5 |
0.9091 |
0.9375 |
0.8333 |
0.8854 |
0.1000 |
0.3500 |
0.2250 |
0.3855 |
0.4082 |
0.3776 |
dmiip2024 |
0.8182 |
0.8750 |
0.6667 |
0.7708 |
0.3500 |
0.3500 |
0.3500 |
0.4520 |
0.4337 |
0.4323 |
dmiip2024_1 |
0.8182 |
0.8667 |
0.7143 |
0.7905 |
0.2500 |
0.2500 |
0.2500 |
0.4680 |
0.4337 |
0.4366 |
dmiip2024_2 |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.1500 |
0.2000 |
0.1750 |
0.3174 |
0.4413 |
0.3482 |
dmiip2024_3 |
0.8182 |
0.8667 |
0.7143 |
0.7905 |
0.2000 |
0.4000 |
0.2917 |
0.5068 |
0.4246 |
0.4546 |
dmiip2024_4 |
0.8182 |
0.8824 |
0.6000 |
0.7412 |
0.2000 |
0.2500 |
0.2250 |
0.4501 |
0.4204 |
0.4238 |
lasigeBioTM-onto-bl |
0.5909 |
0.6400 |
0.5263 |
0.5832 |
0.1000 |
0.1000 |
0.1000 |
0.0909 |
0.0318 |
0.0455 |
IRIS_1 |
0.3182 |
- |
0.4828 |
0.2414 |
- | - | - |
- | - | - |
IRIS_2 |
0.3182 |
- |
0.4828 |
0.2414 |
- | - | - |
- | - | - |
IRIS_3 |
0.3182 |
- |
0.4828 |
0.2414 |
- | - | - |
- | - | - |
extractive |
0.7273 |
0.7857 |
0.6250 |
0.7054 |
0.0500 |
0.2500 |
0.1417 |
0.2878 |
0.3136 |
0.2931 |
mistral |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.1000 |
0.3500 |
0.2167 |
0.3527 |
0.3995 |
0.3436 |
llama |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.2500 |
0.4500 |
0.3500 |
0.3317 |
0.3633 |
0.3393 |
abstractive |
0.8636 |
0.8966 |
0.8000 |
0.8483 |
0.2500 |
0.3500 |
0.3000 |
0.0364 |
0.0606 |
0.0450 |
bioinfo-0 |
0.6818 |
0.8108 |
- |
0.4054 |
- | - | - |
- | - | - |
bioinfo-1 |
0.6818 |
0.8108 |
- |
0.4054 |
- | - | - |
- | - | - |
bioinfo-2 |
0.6818 |
0.8108 |
- |
0.4054 |
- | - | - |
- | - | - |
bioinfo-3 |
0.6818 |
0.8108 |
- |
0.4054 |
- | - | - |
- | - | - |
bioinfo-4 |
0.6818 |
0.8108 |
- |
0.4054 |
- | - | - |
- | - | - |
dense |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.2000 |
0.3000 |
0.2500 |
0.3602 |
0.4054 |
0.3641 |
sp_lasigebiotm |
0.3182 |
- |
0.4828 |
0.2414 |
- | - | - |
0.0455 |
0.0455 |
0.0455 |
GPT4O |
0.7727 |
0.8485 |
0.5455 |
0.6970 |
0.1500 |
0.1500 |
0.1500 |
0.1569 |
0.1712 |
0.1623 |
Fleming-2 |
0.8636 |
0.9032 |
0.7692 |
0.8362 |
0.3000 |
0.4500 |
0.3625 |
0.3293 |
0.4159 |
0.3592 |
deepseek32b-me |
0.7273 |
0.7857 |
0.6250 |
0.7054 |
0.3000 |
0.3000 |
0.3000 |
0.0876 |
0.1167 |
0.0971 |
deepseek-r1:14b |
0.8182 |
0.8667 |
0.7143 |
0.7905 |
- | - | - |
0.1167 |
0.1545 |
0.1303 |
deepseek32b-full |
0.7273 |
0.7857 |
0.6250 |
0.7054 |
0.3000 |
0.3000 |
0.3000 |
0.1239 |
0.1394 |
0.1296 |
AQAMS |
0.8182 |
0.8667 |
0.7143 |
0.7905 |
0.1500 |
0.2000 |
0.1750 |
0.3666 |
0.3513 |
0.3526 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
Only uses GPT-4o |
0.1878 |
0.1790 |
0.1974 |
0.1795 |
4.54 |
3.65 |
4.08 |
4.59 |
IR2 |
0.2197 |
0.2243 |
0.2168 |
0.2194 |
4.40 |
4.02 |
4.32 |
4.56 |
IR3 |
0.2364 |
0.2380 |
0.2362 |
0.2342 |
4.42 |
3.99 |
4.35 |
4.49 |
IR4 |
0.2353 |
0.2384 |
0.2339 |
0.2335 |
4.44 |
3.94 |
4.41 |
4.48 |
NN_Persona_3 |
0.2090 |
0.1165 |
0.2328 |
0.1231 |
4.15 |
4.18 |
3.78 |
4.31 |
NN_Baseline |
0.1846 |
0.1154 |
0.2087 |
0.1261 |
4.29 |
4.29 |
4.04 |
4.49 |
NN_Persona_1 |
0.1954 |
0.1285 |
0.2134 |
0.1316 |
4.13 |
4.21 |
3.85 |
4.28 |
NN_Persona_2 |
0.2027 |
0.1330 |
0.2191 |
0.1366 |
4.19 |
4.22 |
3.87 |
4.27 |
IR1 |
0.1495 |
0.1708 |
0.1500 |
0.1679 |
4.13 |
3.58 |
4.16 |
4.39 |
UR-IW-2 |
0.1335 |
0.1166 |
0.1546 |
0.1299 |
4.41 |
4.02 |
4.32 |
4.59 |
UR-IW-4 |
0.1128 |
0.1006 |
0.1310 |
0.1140 |
4.52 |
3.89 |
4.25 |
4.61 |
lasigeBioTM |
- |
- |
- |
- |
- |
- |
- |
- |
lasigeBioTM-onto-sm |
- |
- |
- |
- |
- |
- |
- |
- |
Fleming-1 |
0.2479 |
0.1105 |
0.2703 |
0.1200 |
3.99 |
3.80 |
3.69 |
4.04 |
Baseline top 10 |
0.0391 |
0.0460 |
0.0397 |
0.0470 |
1.09 |
1.05 |
1.12 |
1.12 |
Using LLM alone |
0.0298 |
0.0353 |
0.0306 |
0.0358 |
1.14 |
0.95 |
1.02 |
1.16 |
Using KG for list q |
0.0504 |
0.0411 |
0.0545 |
0.0438 |
1.13 |
1.13 |
1.06 |
1.13 |
Main pipeline |
0.0504 |
0.0411 |
0.0545 |
0.0438 |
1.13 |
1.13 |
1.06 |
1.13 |
Baseline top 20 |
0.0417 |
0.0494 |
0.0445 |
0.0517 |
1.13 |
1.04 |
1.14 |
1.14 |
UniTor_0 |
0.1472 |
0.1873 |
0.1422 |
0.1805 |
4.38 |
3.45 |
3.95 |
4.41 |
UniTor_1 |
0.1519 |
0.1875 |
0.1489 |
0.1831 |
4.29 |
3.51 |
3.86 |
4.42 |
UniTor_2 |
0.1549 |
0.1955 |
0.1493 |
0.1886 |
4.39 |
3.64 |
4.12 |
4.46 |
UniTor_3 |
0.1517 |
0.1877 |
0.1469 |
0.1800 |
4.28 |
3.52 |
4.00 |
4.42 |
IR5 |
0.1473 |
0.1681 |
0.1528 |
0.1670 |
4.08 |
3.47 |
4.04 |
4.33 |
bious1 |
0.2114 |
0.1931 |
0.2116 |
0.1894 |
4.28 |
3.98 |
4.15 |
4.46 |
bious2 |
0.1841 |
0.1850 |
0.1900 |
0.1861 |
4.40 |
3.89 |
4.12 |
4.51 |
bious3 |
0.1867 |
0.1825 |
0.1947 |
0.1846 |
4.40 |
3.80 |
4.12 |
4.46 |
bious4 |
0.1755 |
0.1782 |
0.1814 |
0.1788 |
4.36 |
3.69 |
3.99 |
4.47 |
bious5 |
0.1852 |
0.1781 |
0.1915 |
0.1788 |
4.36 |
3.87 |
4.14 |
4.48 |
UR-IW-1 |
0.2143 |
0.1433 |
0.2352 |
0.1497 |
4.27 |
4.01 |
3.95 |
4.36 |
UR-IW-3 |
0.2285 |
0.1576 |
0.2397 |
0.1605 |
4.32 |
4.04 |
3.93 |
4.41 |
UR-IW-5 |
0.2087 |
0.1871 |
0.2110 |
0.1850 |
4.49 |
4.06 |
4.11 |
4.51 |
dmiip2024 |
0.1935 |
0.2230 |
0.1890 |
0.2157 |
4.39 |
3.95 |
4.32 |
4.53 |
dmiip2024_1 |
0.2000 |
0.2245 |
0.1972 |
0.2186 |
4.35 |
3.99 |
4.25 |
4.49 |
dmiip2024_2 |
0.1909 |
0.2181 |
0.1903 |
0.2160 |
4.25 |
3.84 |
4.24 |
4.41 |
dmiip2024_3 |
0.1619 |
0.1986 |
0.1574 |
0.1911 |
4.41 |
3.89 |
4.40 |
4.51 |
dmiip2024_4 |
0.1863 |
0.2158 |
0.1795 |
0.2069 |
4.33 |
3.87 |
4.25 |
4.51 |
lasigeBioTM-onto-bl |
- |
- |
- |
- |
- |
- |
- |
- |
IRIS_1 |
0.1514 |
0.1646 |
0.1572 |
0.1652 |
4.29 |
3.40 |
4.01 |
4.49 |
IRIS_2 |
0.1541 |
0.1565 |
0.1558 |
0.1538 |
4.08 |
3.65 |
3.96 |
4.40 |
IRIS_3 |
0.1633 |
0.1499 |
0.1676 |
0.1515 |
4.02 |
3.42 |
3.79 |
4.34 |
extractive |
0.0456 |
0.0306 |
0.0504 |
0.0330 |
1.01 |
1.00 |
0.91 |
1.11 |
mistral |
0.1927 |
0.1548 |
0.1989 |
0.1545 |
4.26 |
3.93 |
3.99 |
4.38 |
llama |
0.1849 |
0.1545 |
0.1927 |
0.1533 |
4.26 |
3.96 |
3.98 |
4.33 |
abstractive |
0.0488 |
0.0233 |
0.0550 |
0.0262 |
1.02 |
1.09 |
0.94 |
1.08 |
bioinfo-0 |
0.1841 |
0.1735 |
0.1825 |
0.1697 |
4.38 |
4.12 |
4.35 |
4.55 |
bioinfo-1 |
0.2225 |
0.1798 |
0.2226 |
0.1756 |
4.15 |
4.09 |
4.13 |
4.40 |
bioinfo-2 |
0.1958 |
0.1463 |
0.2098 |
0.1529 |
4.32 |
4.29 |
4.28 |
4.47 |
bioinfo-3 |
0.1943 |
0.1483 |
0.2064 |
0.1531 |
4.34 |
4.26 |
4.11 |
4.36 |
bioinfo-4 |
0.2027 |
0.1429 |
0.2175 |
0.1465 |
4.22 |
4.34 |
4.04 |
4.42 |
dense |
0.1861 |
0.1483 |
0.1982 |
0.1513 |
4.26 |
3.98 |
4.02 |
4.36 |
sp_lasigebiotm |
0.0852 |
0.0878 |
0.0861 |
0.0871 |
3.11 |
2.04 |
2.45 |
3.76 |
GPT4O |
0.0892 |
0.0962 |
0.0977 |
0.1043 |
3.78 |
2.91 |
3.44 |
4.18 |
Fleming-2 |
0.2230 |
0.1349 |
0.2478 |
0.1424 |
4.18 |
4.06 |
3.98 |
4.38 |
deepseek32b-me |
0.1408 |
0.1300 |
0.1531 |
0.1370 |
4.24 |
3.40 |
3.86 |
4.46 |
deepseek-r1:14b |
0.0845 |
0.0940 |
0.0962 |
0.1034 |
3.85 |
2.84 |
3.65 |
4.34 |
deepseek32b-full |
0.1502 |
0.1397 |
0.1635 |
0.1497 |
4.18 |
3.47 |
3.81 |
4.48 |
AQAMS |
0.2275 |
0.1069 |
0.2514 |
0.1164 |
4.13 |
3.28 |
3.61 |
4.34 |
Test batch 4
Exact Answers
|
Yes/No |
Factoid |
List |
System |
Accuracy |
F1 Yes |
F1 No |
Macro F1 |
Strict Acc. |
Lenient Acc. |
MRR |
Mean Prec. |
Recall |
F-Measure |
IR1 |
0.8462 |
0.8824 |
0.7778 |
0.8301 |
0.4545 |
0.4545 |
0.4545 |
0.3386 |
0.2235 |
0.2572 |
UniTor_0 |
0.8462 |
0.8824 |
0.7778 |
0.8301 |
0.4545 |
0.4545 |
0.4545 |
0.2691 |
0.2563 |
0.2503 |
UniTor_1 |
0.8462 |
0.8824 |
0.7778 |
0.8301 |
0.4545 |
0.4545 |
0.4545 |
0.2539 |
0.2751 |
0.2483 |
UniTor_2 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.4091 |
0.4091 |
0.4091 |
0.2710 |
0.2845 |
0.2465 |
UniTor_3 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.4091 |
0.4091 |
0.4091 |
0.2746 |
0.2735 |
0.2591 |
simple truncation |
0.8077 |
0.8649 |
0.6667 |
0.7658 |
0.2727 |
0.4545 |
0.3371 |
0.1775 |
0.2496 |
0.1985 |
Using KG for list q |
0.9615 |
0.9714 |
0.9412 |
0.9563 |
0.4091 |
0.5000 |
0.4356 |
0.2215 |
0.3765 |
0.2533 |
Baseline top 20 |
0.9615 |
0.9714 |
0.9412 |
0.9563 |
0.4091 |
0.4545 |
0.4318 |
0.4157 |
0.3339 |
0.3578 |
Baseline top 10 |
0.9615 |
0.9714 |
0.9412 |
0.9563 |
0.4545 |
0.5455 |
0.5000 |
0.3721 |
0.3248 |
0.3307 |
Using LLM alone |
0.8846 |
0.9091 |
0.8421 |
0.8756 |
0.4091 |
0.4091 |
0.4091 |
0.2785 |
0.3186 |
0.2837 |
Main pipeline |
0.9231 |
0.9444 |
0.8750 |
0.9097 |
0.4091 |
0.5000 |
0.4356 |
0.2362 |
0.3901 |
0.2703 |
IR5 |
0.6923 |
0.7500 |
0.6000 |
0.6750 |
0.5000 |
0.5000 |
0.5000 |
0.2522 |
0.2257 |
0.2313 |
IR3 |
0.9615 |
0.9714 |
0.9412 |
0.9563 |
0.4091 |
0.4545 |
0.4318 |
0.4132 |
0.2353 |
0.2693 |
IR4 |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.3636 |
0.3636 |
0.3636 |
0.3242 |
0.2051 |
0.2329 |
IR2 |
0.8846 |
0.9091 |
0.8421 |
0.8756 |
0.5000 |
0.5000 |
0.5000 |
0.3772 |
0.2254 |
0.2685 |
AQAMS |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.4091 |
0.4091 |
0.4091 |
0.3531 |
0.2891 |
0.3033 |
kmeans |
0.6538 |
0.6897 |
0.6087 |
0.6492 |
0.0909 |
0.0909 |
0.0909 |
0.2383 |
0.2386 |
0.2097 |
similarity measures |
0.8846 |
0.9143 |
0.8235 |
0.8689 |
0.4091 |
0.5455 |
0.4773 |
0.2253 |
0.3185 |
0.2202 |
deepseek32b-me |
0.8077 |
0.8649 |
0.6667 |
0.7658 |
0.4545 |
0.4545 |
0.4545 |
0.3520 |
0.2364 |
0.2675 |
deepseek32b-full |
0.7692 |
0.8333 |
0.6250 |
0.7292 |
0.4091 |
0.4091 |
0.4091 |
0.3222 |
0.2369 |
0.2529 |
bious2 |
0.7692 |
0.8125 |
0.7000 |
0.7563 |
0.2727 |
0.3636 |
0.3182 |
0.2454 |
0.1819 |
0.1966 |
bious3 |
0.8462 |
0.8750 |
0.8000 |
0.8375 |
0.4091 |
0.4545 |
0.4318 |
0.2999 |
0.2081 |
0.2283 |
bious4 |
0.8077 |
0.8485 |
0.7368 |
0.7927 |
0.3636 |
0.4091 |
0.3864 |
0.2904 |
0.2350 |
0.2496 |
bious5 |
0.8462 |
0.8750 |
0.8000 |
0.8375 |
0.3636 |
0.3636 |
0.3636 |
0.2624 |
0.1881 |
0.2027 |
dmiip2024 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.5000 |
0.5455 |
0.5227 |
0.2799 |
0.2798 |
0.2695 |
dmiip2024_1 |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.4545 |
0.4545 |
0.4545 |
0.3102 |
0.2758 |
0.2848 |
dmiip2024_2 |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.2727 |
0.2727 |
0.2727 |
0.2669 |
0.3014 |
0.2707 |
dmiip2024_4 |
0.8077 |
0.8718 |
0.6154 |
0.7436 |
0.4545 |
0.5000 |
0.4773 |
0.3289 |
0.2361 |
0.2647 |
Fleming-2 |
0.8846 |
0.9143 |
0.8235 |
0.8689 |
0.2727 |
0.4545 |
0.3273 |
0.1469 |
0.3075 |
0.1858 |
dmiip2024_3 |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.5000 |
0.5909 |
0.5303 |
0.4053 |
0.2513 |
0.2911 |
Fleming-1 |
0.8846 |
0.9143 |
0.8235 |
0.8689 |
0.2727 |
0.4091 |
0.3144 |
0.2471 |
0.3185 |
0.2632 |
UR-IW-1 |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.5000 |
0.5455 |
0.5152 |
0.2307 |
0.3274 |
0.2379 |
UR-IW-2 |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
0.4545 |
0.4545 |
0.4545 |
0.2378 |
0.3622 |
0.2644 |
UR-IW-3 |
0.8462 |
0.8824 |
0.7778 |
0.8301 |
0.4545 |
0.4545 |
0.4545 |
0.3150 |
0.3261 |
0.3050 |
UR-IW-5 |
0.8462 |
0.8947 |
0.7143 |
0.8045 |
0.5455 |
0.5909 |
0.5606 |
0.2984 |
0.3626 |
0.3057 |
UR-IW-4 |
0.8846 |
0.9032 |
0.8571 |
0.8802 |
0.4091 |
0.4545 |
0.4242 |
0.2952 |
0.3048 |
0.2824 |
Fleming-3 |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.2727 |
0.4545 |
0.3273 |
0.1469 |
0.3075 |
0.1858 |
extractive |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.4545 |
0.5455 |
0.5000 |
0.2150 |
0.3385 |
0.2128 |
abstractive |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.4545 |
0.5455 |
0.5000 |
0.1848 |
0.3152 |
0.2235 |
Only uses GPT-4o |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.4545 |
0.4545 |
0.4545 |
0.2377 |
0.2098 |
0.2132 |
NN_Persona_1 |
0.8462 |
0.8667 |
0.8182 |
0.8424 |
0.3636 |
0.4545 |
0.4091 |
0.2385 |
0.2875 |
0.2476 |
NN_Persona_2 |
0.8077 |
0.8387 |
0.7619 |
0.8003 |
0.3636 |
0.4091 |
0.3864 |
0.1585 |
0.1871 |
0.1571 |
NN_Persona_3 |
0.8077 |
0.8485 |
0.7368 |
0.7927 |
0.4545 |
0.5455 |
0.4750 |
0.1972 |
0.2568 |
0.2113 |
NN_Baseline |
0.9231 |
0.9375 |
0.9000 |
0.9188 |
0.4091 |
0.4091 |
0.4091 |
0.2945 |
0.2412 |
0.2459 |
GPT4O |
0.8077 |
0.8571 |
0.7059 |
0.7815 |
0.2727 |
0.3636 |
0.3182 |
0.2567 |
0.2443 |
0.2256 |
deepseek-r1:32b |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.2727 |
0.3636 |
0.3182 |
0.3132 |
0.2360 |
0.2324 |
deepseek-r1:14b |
0.9231 |
0.9444 |
0.8750 |
0.9097 |
0.3636 |
0.3636 |
0.3636 |
0.2946 |
0.2166 |
0.2357 |
deepseek-r1:8b |
0.9615 |
0.9714 |
0.9412 |
0.9563 |
0.3636 |
0.3636 |
0.3636 |
0.2996 |
0.2175 |
0.2410 |
Fleming-4 |
0.8846 |
0.9143 |
0.8235 |
0.8689 |
0.2727 |
0.4091 |
0.3121 |
0.2485 |
0.3185 |
0.2643 |
mistral |
0.8462 |
0.8889 |
0.7500 |
0.8194 |
0.4545 |
0.4545 |
0.4545 |
0.2508 |
0.3153 |
0.2629 |
llama |
0.8846 |
0.9189 |
0.8000 |
0.8595 |
0.4091 |
0.4091 |
0.4091 |
0.2854 |
0.3087 |
0.2735 |
sp_lasigebiotm |
0.8077 |
0.8387 |
0.7619 |
0.8003 |
0.3182 |
0.3182 |
0.3182 |
0.2189 |
0.0995 |
0.1309 |
lasigeBioTM |
0.8462 |
0.8750 |
0.8000 |
0.8375 |
0.3636 |
0.3636 |
0.3636 |
0.1751 |
0.1373 |
0.1511 |
lasigeBioTM-onto-bl |
0.8077 |
0.8485 |
0.7368 |
0.7927 |
0.3636 |
0.3636 |
0.3636 |
0.1998 |
0.1387 |
0.1553 |
lasigeBioTM-onto-sm |
0.7692 |
0.8000 |
0.7273 |
0.7636 |
0.2273 |
0.2273 |
0.2273 |
0.1228 |
0.0697 |
0.0794 |
bioinfo-0 |
0.6538 |
0.7907 |
- |
0.3953 |
- | - | - |
- | - | - |
bioinfo-1 |
0.6538 |
0.7907 |
- |
0.3953 |
- | - | - |
- | - | - |
deepseek32b-f |
0.9231 |
0.9412 |
0.8889 |
0.9150 |
0.3636 |
0.3636 |
0.3636 |
0.2519 |
0.2170 |
0.2020 |
bioinfo-2 |
0.6538 |
0.7907 |
- |
0.3953 |
- | - | - |
- | - | - |
bioinfo-3 |
0.6538 |
0.7907 |
- |
0.3953 |
- | - | - |
- | - | - |
bioinfo-4 |
0.6538 |
0.7907 |
- |
0.3953 |
- | - | - |
- | - | - |
bious1 |
0.8846 |
0.9032 |
0.8571 |
0.8802 |
0.4545 |
0.4545 |
0.4545 |
0.2340 |
0.2290 |
0.2107 |
Fleming-5 |
0.9231 |
0.9444 |
0.8750 |
0.9097 |
0.2727 |
0.4091 |
0.3121 |
0.2485 |
0.3185 |
0.2643 |
phaseB-5 |
0.8846 |
0.9143 |
0.8235 |
0.8689 |
0.4091 |
0.4091 |
0.4091 |
0.2434 |
0.1856 |
0.1969 |
gpt 01 mini |
0.8077 |
0.8571 |
0.7059 |
0.7815 |
0.2727 |
0.2727 |
0.2727 |
0.2504 |
0.2223 |
0.2118 |
phaseB-4 |
0.9231 |
0.9444 |
0.8750 |
0.9097 |
0.4091 |
0.4091 |
0.4091 |
0.2771 |
0.2325 |
0.2403 |
3.PhaseB_System |
0.6538 |
0.7907 |
- |
0.3953 |
0.1818 |
0.1818 |
0.1818 |
0.0132 |
0.0105 |
0.0117 |
Ideal Answers
|
Automatic scores (Rouge - R) |
Manual scores |
System |
R-2 (Rec) |
R-2 (F1) |
R-SU4 (Rec) |
R-SU4 (F1) |
Readability |
Recall |
Precision |
Repetition |
IR1 |
0.1262 |
0.1470 |
0.1244 |
0.1438 |
3.87 |
3.35 |
3.84 |
4.05 |
UniTor_0 |
0.1195 |
0.1502 |
0.1136 |
0.1429 |
4.08 |
3.47 |
3.89 |
4.18 |
UniTor_1 |
0.1185 |
0.1489 |
0.1094 |
0.1380 |
4.01 |
3.40 |
3.82 |
4.13 |
UniTor_2 |
0.1268 |
0.1564 |
0.1221 |
0.1506 |
4.19 |
3.54 |
4.02 |
4.28 |
UniTor_3 |
0.1208 |
0.1503 |
0.1161 |
0.1427 |
4.00 |
3.36 |
3.75 |
4.16 |
simple truncation |
0.0323 |
0.0233 |
0.0367 |
0.0260 |
0.84 |
0.81 |
0.75 |
0.89 |
Using KG for list q |
0.0351 |
0.0310 |
0.0383 |
0.0339 |
0.93 |
0.88 |
0.87 |
0.93 |
Baseline top 20 |
0.0235 |
0.0301 |
0.0248 |
0.0317 |
0.88 |
0.78 |
0.92 |
0.91 |
Baseline top 10 |
0.0247 |
0.0313 |
0.0251 |
0.0316 |
0.88 |
0.75 |
0.87 |
0.89 |
Using LLM alone |
0.0217 |
0.0277 |
0.0221 |
0.0288 |
0.88 |
0.73 |
0.87 |
0.89 |
Main pipeline |
0.0351 |
0.0310 |
0.0383 |
0.0339 |
0.93 |
0.88 |
0.87 |
0.93 |
IR5 |
0.1309 |
0.1479 |
0.1293 |
0.1463 |
4.11 |
3.46 |
3.94 |
4.26 |
IR3 |
0.1721 |
0.1843 |
0.1737 |
0.1856 |
4.15 |
3.72 |
4.07 |
4.19 |
IR4 |
0.1874 |
0.1960 |
0.1863 |
0.1941 |
4.11 |
3.74 |
4.13 |
4.27 |
IR2 |
0.1623 |
0.1844 |
0.1594 |
0.1802 |
4.29 |
3.91 |
4.14 |
4.32 |
AQAMS |
0.2225 |
0.1223 |
0.2473 |
0.1341 |
4.13 |
4.16 |
3.82 |
4.16 |
kmeans |
- |
- |
- |
- |
- |
- |
- |
- |
similarity measures |
0.0343 |
0.0252 |
0.0399 |
0.0288 |
0.85 |
0.78 |
0.74 |
0.89 |
deepseek32b-me |
0.1075 |
0.1406 |
0.1006 |
0.1312 |
3.88 |
3.20 |
3.71 |
4.05 |
deepseek32b-full |
0.0942 |
0.1305 |
0.0893 |
0.1242 |
3.79 |
3.13 |
3.69 |
4.04 |
bious2 |
0.1565 |
0.1700 |
0.1584 |
0.1701 |
4.06 |
3.67 |
3.92 |
4.14 |
bious3 |
0.1553 |
0.1674 |
0.1559 |
0.1660 |
4.09 |
3.76 |
3.94 |
4.18 |
bious4 |
0.1538 |
0.1664 |
0.1555 |
0.1664 |
4.12 |
3.71 |
3.95 |
4.20 |
bious5 |
0.1527 |
0.1615 |
0.1542 |
0.1605 |
4.18 |
3.75 |
3.98 |
4.21 |
dmiip2024 |
0.1433 |
0.1699 |
0.1405 |
0.1658 |
4.22 |
3.72 |
4.05 |
4.26 |
dmiip2024_1 |
0.1532 |
0.1700 |
0.1500 |
0.1656 |
4.20 |
3.86 |
4.06 |
4.24 |
dmiip2024_2 |
0.1329 |
0.1690 |
0.1293 |
0.1636 |
4.16 |
3.61 |
3.96 |
4.22 |
dmiip2024_4 |
0.1307 |
0.1673 |
0.1250 |
0.1583 |
4.25 |
3.56 |
4.07 |
4.35 |
Fleming-2 |
0.1813 |
0.0900 |
0.2062 |
0.1014 |
3.91 |
4.16 |
3.55 |
4.14 |
dmiip2024_3 |
0.1340 |
0.1735 |
0.1247 |
0.1627 |
4.16 |
3.61 |
3.99 |
4.22 |
Fleming-1 |
0.2248 |
0.1136 |
0.2462 |
0.1233 |
4.02 |
4.16 |
3.69 |
4.20 |
UR-IW-1 |
0.2005 |
0.1405 |
0.2175 |
0.1501 |
4.16 |
4.19 |
3.93 |
4.16 |
UR-IW-2 |
0.1521 |
0.1330 |
0.1675 |
0.1445 |
4.42 |
4.27 |
4.18 |
4.51 |
UR-IW-3 |
0.1859 |
0.1464 |
0.1993 |
0.1552 |
4.27 |
4.09 |
4.04 |
4.18 |
UR-IW-5 |
0.1691 |
0.1462 |
0.1776 |
0.1539 |
4.13 |
3.98 |
3.86 |
4.19 |
UR-IW-4 |
0.1227 |
0.1061 |
0.1416 |
0.1194 |
4.25 |
4.15 |
4.07 |
4.44 |
Fleming-3 |
0.1974 |
0.1136 |
0.2188 |
0.1248 |
4.22 |
4.15 |
3.75 |
4.33 |
extractive |
0.0360 |
0.0271 |
0.0414 |
0.0306 |
0.86 |
0.79 |
0.76 |
0.91 |
abstractive |
0.0331 |
0.0239 |
0.0383 |
0.0273 |
0.81 |
0.76 |
0.73 |
0.86 |
Only uses GPT-4o |
0.2110 |
0.1244 |
0.2357 |
0.1367 |
4.39 |
4.35 |
4.05 |
4.38 |
NN_Persona_1 |
0.2099 |
0.0986 |
0.2388 |
0.1104 |
4.00 |
4.25 |
3.68 |
4.06 |
NN_Persona_2 |
0.2053 |
0.1109 |
0.2319 |
0.1224 |
4.00 |
4.28 |
3.66 |
4.09 |
NN_Persona_3 |
0.2088 |
0.0966 |
0.2345 |
0.1090 |
4.09 |
4.22 |
3.72 |
4.13 |
NN_Baseline |
0.1861 |
0.1150 |
0.2116 |
0.1296 |
4.35 |
4.29 |
3.95 |
4.45 |
GPT4O |
0.0777 |
0.0900 |
0.0863 |
0.0990 |
3.69 |
3.08 |
3.47 |
3.99 |
deepseek-r1:32b |
0.0785 |
0.0908 |
0.0834 |
0.0963 |
3.85 |
3.14 |
3.60 |
4.01 |
deepseek-r1:14b |
0.1959 |
0.1444 |
0.2073 |
0.1520 |
4.28 |
3.96 |
4.05 |
4.40 |
deepseek-r1:8b |
0.1877 |
0.1384 |
0.2022 |
0.1475 |
4.24 |
3.94 |
3.95 |
4.31 |
Fleming-4 |
0.1803 |
0.1009 |
0.1994 |
0.1108 |
4.05 |
4.14 |
3.64 |
4.24 |
mistral |
0.2003 |
0.1463 |
0.2130 |
0.1518 |
4.16 |
4.15 |
3.95 |
4.27 |
llama |
0.1661 |
0.1421 |
0.1721 |
0.1450 |
4.25 |
3.95 |
3.81 |
4.29 |
sp_lasigebiotm |
0.1381 |
0.1069 |
0.1591 |
0.1212 |
4.00 |
3.60 |
3.66 |
4.13 |
lasigeBioTM |
0.1645 |
0.1058 |
0.1908 |
0.1211 |
3.85 |
3.73 |
3.59 |
4.01 |
lasigeBioTM-onto-bl |
0.1660 |
0.1132 |
0.1930 |
0.1290 |
4.13 |
3.72 |
3.60 |
4.21 |
lasigeBioTM-onto-sm |
0.1032 |
0.1124 |
0.1020 |
0.1102 |
3.64 |
3.01 |
3.52 |
3.96 |
bioinfo-0 |
0.1435 |
0.1444 |
0.1441 |
0.1435 |
4.07 |
3.69 |
3.96 |
4.25 |
bioinfo-1 |
0.1482 |
0.1675 |
0.1427 |
0.1592 |
4.08 |
3.86 |
4.00 |
4.19 |
deepseek32b-f |
0.1621 |
0.1231 |
0.1681 |
0.1242 |
4.19 |
4.18 |
3.80 |
4.32 |
bioinfo-2 |
0.1729 |
0.1300 |
0.1899 |
0.1423 |
4.11 |
4.14 |
3.93 |
4.24 |
bioinfo-3 |
0.1435 |
0.1550 |
0.1436 |
0.1528 |
4.21 |
3.88 |
4.08 |
4.32 |
bioinfo-4 |
0.1679 |
0.1194 |
0.1852 |
0.1312 |
4.24 |
4.19 |
3.95 |
4.31 |
bious1 |
0.1742 |
0.1759 |
0.1753 |
0.1739 |
4.24 |
3.94 |
3.96 |
4.32 |
Fleming-5 |
0.1803 |
0.1009 |
0.1994 |
0.1108 |
4.05 |
4.14 |
3.64 |
4.24 |
phaseB-5 |
0.1487 |
0.1268 |
0.1502 |
0.1261 |
4.19 |
4.22 |
3.78 |
4.35 |
gpt 01 mini |
0.0894 |
0.0993 |
0.0976 |
0.1078 |
3.81 |
3.19 |
3.58 |
3.99 |
phaseB-4 |
0.1727 |
0.1357 |
0.1816 |
0.1399 |
4.24 |
4.29 |
3.91 |
4.44 |
3.PhaseB_System |
0.0594 |
0.0828 |
0.0624 |
0.0855 |
- |
- |
- |
- |