BioASQ Participants Area
Task 13b: Test Results of Phase A+
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.The evaluation measures that are used in Task A+ are presented here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| UniTor_0 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.3077 | 0.3077 | 0.3077 | 0.2152 | 0.2031 | 0.2039 |
| UniTor_1 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.2692 | 0.2692 | 0.2692 | 0.2506 | 0.3323 | 0.2563 |
| Only uses GPT-4o | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.1538 | 0.2308 | 0.1859 | 0.1596 | 0.1424 | 0.1415 |
| NN_Persona_2 | 0.8824 | 0.9231 | 0.7500 | 0.8365 | 0.2308 | 0.3077 | 0.2692 | 0.1880 | 0.2018 | 0.1821 |
| Baseline top 20 | 0.9412 | 0.9600 | 0.8889 | 0.9244 | 0.3462 | 0.3846 | 0.3654 | 0.3189 | 0.2663 | 0.2782 |
| Using LLM alone | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1538 | 0.2308 | 0.1827 | 0.2530 | 0.2724 | 0.2496 |
| Baseline top 10 | 0.9412 | 0.9600 | 0.8889 | 0.9244 | 0.3077 | 0.3462 | 0.3269 | 0.3249 | 0.3374 | 0.3038 |
| Using KG for list q | 0.9412 | 0.9600 | 0.8889 | 0.9244 | 0.3077 | 0.4615 | 0.3782 | 0.1249 | 0.3496 | 0.1645 |
| Main pipeline | 0.9412 | 0.9600 | 0.8889 | 0.9244 | 0.3077 | 0.4615 | 0.3782 | 0.1485 | 0.2791 | 0.1698 |
| bioinfo-0 | 0.7059 | 0.8276 | - | 0.4138 | - | - | - | - | - | - |
| bioinfo-1 | 0.7059 | 0.8276 | - | 0.4138 | - | - | - | - | - | - |
| bioinfo-2 | 0.7059 | 0.8276 | - | 0.4138 | - | - | - | - | - | - |
| bioinfo-3 | 0.7059 | 0.8276 | - | 0.4138 | - | - | - | - | - | - |
| bioinfo-4 | 0.7059 | 0.8276 | - | 0.4138 | - | - | - | - | - | - |
| UniTor_2 | 0.8824 | 0.9167 | 0.8000 | 0.8583 | 0.3077 | 0.3462 | 0.3269 | 0.1884 | 0.2449 | 0.1848 |
| UniTor_3 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.3077 | 0.3462 | 0.3269 | 0.1851 | 0.2414 | 0.1807 |
| NN_Persona_1 | 0.8824 | 0.9231 | 0.7500 | 0.8365 | 0.2692 | 0.3462 | 0.2962 | 0.1558 | 0.1972 | 0.1491 |
| NN_Persona_3 | 0.9412 | 0.9600 | 0.8889 | 0.9244 | 0.3077 | 0.3846 | 0.3365 | 0.1622 | 0.1915 | 0.1562 |
| deepseek32b-me | 0.2941 | - | 0.4545 | 0.2273 | 0.0769 | 0.0769 | 0.0769 | 0.2360 | 0.1503 | 0.1708 |
| GPT4O | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.1154 | 0.1154 | 0.1154 | 0.2008 | 0.1301 | 0.1484 |
| Fleming-1 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.3077 | 0.4615 | 0.3571 | 0.2363 | 0.1666 | 0.1799 |
| Fleming-2 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.3077 | 0.4615 | 0.3571 | 0.2029 | 0.2029 | 0.1760 |
| google_serach_&_LLM | 0.7059 | 0.7368 | 0.6667 | 0.7018 | 0.1154 | 0.1154 | 0.1154 | 0.1706 | 0.0838 | 0.1074 |
| DB_vector_&_LLM | 0.4706 | 0.4000 | 0.5263 | 0.4632 | 0.1154 | 0.1154 | 0.1154 | 0.1359 | 0.0727 | 0.0930 |
| IRIS_1 | 0.2941 | - | 0.4545 | 0.2273 | - | - | - | - | - | - |
| IRIS_2 | 0.2941 | - | 0.4545 | 0.2273 | - | - | - | - | - | - |
| UR-IW-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2692 | 0.3462 | 0.3077 | 0.2556 | 0.3513 | 0.2635 |
| UR-IW-3 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.3077 | 0.3846 | 0.3462 | 0.2535 | 0.3228 | 0.2633 |
| UR-IW-5 | 0.8235 | 0.8696 | 0.7273 | 0.7984 | 0.3846 | 0.4231 | 0.4038 | 0.3151 | 0.4094 | 0.3223 |
| bious2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2308 | 0.2692 | 0.2500 | 0.1792 | 0.1646 | 0.1626 |
| IRIS_3 | 0.2941 | - | 0.4545 | 0.2273 | - | - | - | - | - | - |
| bious3 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2308 | 0.2308 | 0.2308 | 0.1946 | 0.1475 | 0.1609 |
| bious4 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1538 | 0.1923 | 0.1731 | 0.2037 | 0.1267 | 0.1509 |
| bious5 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1154 | 0.1154 | 0.1154 | 0.2673 | 0.2161 | 0.2198 |
| IR1 | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.3462 | 0.3462 | 0.3462 | 0.2333 | 0.0836 | 0.1119 |
| Fleming-3 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.3077 | 0.4615 | 0.3571 | 0.2029 | 0.2029 | 0.1760 |
| qa | 0.2941 | - | 0.4545 | 0.2273 | 0.1154 | 0.1154 | 0.1154 | 0.1783 | 0.0982 | 0.1227 |
| bious1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1923 | 0.1923 | 0.1923 | 0.2497 | 0.2168 | 0.2233 |
| deepseek-r1:32b | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.0769 | 0.0769 | 0.0769 | 0.2110 | 0.1369 | 0.1558 |
| config-1 | 0.8824 | 0.9231 | 0.7500 | 0.8365 | 0.3077 | 0.3077 | 0.3077 | 0.2511 | 0.1532 | 0.1838 |
| config-2 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.2692 | 0.3077 | 0.2885 | 0.3059 | 0.1899 | 0.2230 |
| config-3 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3462 | 0.3846 | 0.3590 | 0.3114 | 0.2520 | 0.2698 |
| config-4 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.3462 | 0.3846 | 0.3654 | 0.3585 | 0.3141 | 0.3089 |
| config-5 | 0.8824 | 0.9167 | 0.8000 | 0.8583 | 0.3846 | 0.4231 | 0.4038 | 0.3717 | 0.2932 | 0.3185 |
| mistral | 0.8824 | 0.9167 | 0.8000 | 0.8583 | 0.2692 | 0.3077 | 0.2885 | 0.2555 | 0.2160 | 0.2196 |
| UR-IW-2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3846 | 0.4615 | 0.4167 | 0.2944 | 0.3073 | 0.2907 |
| UR-IW-4 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2692 | 0.3462 | 0.3077 | 0.3283 | 0.3397 | 0.3004 |
| dmiip2024 | 0.9412 | 0.9600 | 0.8889 | 0.9244 | 0.3462 | 0.3462 | 0.3462 | 0.2826 | 0.2264 | 0.2379 |
| dmiip2024_1 | 0.8235 | 0.8800 | 0.6667 | 0.7733 | 0.3462 | 0.3462 | 0.3462 | 0.2986 | 0.2220 | 0.2423 |
| deepseek-r1:14b | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.1154 | 0.1154 | 0.1154 | 0.1913 | 0.1205 | 0.1383 |
| dmiip2024_2 | 0.9412 | 0.9600 | 0.8889 | 0.9244 | 0.3846 | 0.4231 | 0.4038 | 0.2760 | 0.2231 | 0.2324 |
| deepseek-r1:8b | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.0769 | 0.0769 | 0.0769 | 0.2747 | 0.1846 | 0.2086 |
| gpt 01 mini | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.0769 | 0.0769 | 0.0769 | 0.2110 | 0.1369 | 0.1558 |
| dmiip2024_3 | 0.9412 | 0.9565 | 0.9091 | 0.9328 | 0.4231 | 0.5000 | 0.4551 | 0.2288 | 0.3018 | 0.2379 |
| deepseek32b-full | 0.2941 | - | 0.4545 | 0.2273 | 0.0769 | 0.0769 | 0.0769 | 0.2234 | 0.1512 | 0.1680 |
| lasigeBioTM | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1154 | 0.1154 | 0.1154 | 0.1783 | 0.1792 | 0.1549 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| UniTor_0 | 0.1325 | 0.1666 | 0.1278 | 0.1598 | 4.06 | 3.69 | 4.00 | 4.24 |
| UniTor_1 | 0.1327 | 0.1694 | 0.1270 | 0.1610 | 4.00 | 3.62 | 3.88 | 4.12 |
| Only uses GPT-4o | 0.2104 | 0.1989 | 0.2151 | 0.1997 | 4.52 | 3.84 | 4.32 | 4.49 |
| NN_Persona_2 | 0.2419 | 0.1643 | 0.2531 | 0.1665 | 4.13 | 4.38 | 3.96 | 4.19 |
| Baseline top 20 | 0.0279 | 0.0355 | 0.0272 | 0.0349 | 0.87 | 0.78 | 0.86 | 0.88 |
| Using LLM alone | 0.0178 | 0.0242 | 0.0194 | 0.0261 | 0.91 | 0.80 | 0.88 | 0.92 |
| Baseline top 10 | 0.0262 | 0.0328 | 0.0253 | 0.0324 | 0.85 | 0.81 | 0.81 | 0.87 |
| Using KG for list q | 0.0343 | 0.0320 | 0.0361 | 0.0337 | 0.93 | 0.91 | 0.89 | 0.94 |
| Main pipeline | 0.0343 | 0.0320 | 0.0361 | 0.0337 | 0.93 | 0.91 | 0.89 | 0.94 |
| bioinfo-0 | 0.1434 | 0.1667 | 0.1379 | 0.1587 | 4.18 | 3.95 | 4.12 | 4.28 |
| bioinfo-1 | 0.2540 | 0.1727 | 0.2588 | 0.1714 | 4.24 | 4.26 | 4.02 | 4.36 |
| bioinfo-2 | 0.2270 | 0.1803 | 0.2353 | 0.1831 | 4.36 | 4.39 | 4.16 | 4.51 |
| bioinfo-3 | 0.2395 | 0.1818 | 0.2454 | 0.1842 | 4.38 | 4.40 | 4.05 | 4.44 |
| bioinfo-4 | 0.2183 | 0.1638 | 0.2278 | 0.1702 | 4.36 | 4.46 | 4.06 | 4.42 |
| UniTor_2 | 0.1329 | 0.1590 | 0.1313 | 0.1535 | 4.00 | 3.62 | 3.92 | 4.20 |
| UniTor_3 | 0.1319 | 0.1620 | 0.1289 | 0.1552 | 4.15 | 3.73 | 4.02 | 4.27 |
| NN_Persona_1 | 0.2576 | 0.1807 | 0.2670 | 0.1828 | 4.25 | 4.45 | 4.08 | 4.27 |
| NN_Persona_3 | 0.2405 | 0.1701 | 0.2492 | 0.1722 | 4.28 | 4.39 | 4.01 | 4.32 |
| deepseek32b-me | 0.1820 | 0.1576 | 0.1850 | 0.1580 | 4.06 | 3.99 | 4.02 | 4.32 |
| GPT4O | 0.2319 | 0.1582 | 0.2413 | 0.1636 | 4.53 | 3.86 | 4.15 | 4.52 |
| Fleming-1 | 0.2025 | 0.1576 | 0.2103 | 0.1609 | 4.21 | 3.89 | 3.89 | 4.38 |
| Fleming-2 | 0.2025 | 0.1576 | 0.2103 | 0.1609 | 4.21 | 3.89 | 3.89 | 4.38 |
| google_serach_&_LLM | 0.1553 | 0.0988 | 0.1724 | 0.1092 | 3.65 | 3.06 | 3.31 | 3.76 |
| DB_vector_&_LLM | 0.1351 | 0.0813 | 0.1556 | 0.0939 | 3.74 | 2.87 | 3.13 | 3.86 |
| IRIS_1 | 0.1466 | 0.1614 | 0.1479 | 0.1617 | 4.14 | 3.59 | 3.98 | 4.29 |
| IRIS_2 | 0.1476 | 0.1520 | 0.1474 | 0.1521 | 3.81 | 3.55 | 3.82 | 4.12 |
| UR-IW-1 | 0.2444 | 0.1830 | 0.2541 | 0.1877 | 4.34 | 4.15 | 4.01 | 4.34 |
| UR-IW-3 | 0.2308 | 0.1848 | 0.2380 | 0.1876 | 4.24 | 4.05 | 4.00 | 4.29 |
| UR-IW-5 | 0.2140 | 0.1830 | 0.2252 | 0.1897 | 4.29 | 4.11 | 3.89 | 4.39 |
| bious2 | 0.1531 | 0.1713 | 0.1529 | 0.1699 | 4.20 | 3.54 | 3.81 | 4.34 |
| IRIS_3 | 0.1428 | 0.1440 | 0.1455 | 0.1461 | 3.72 | 3.48 | 3.81 | 4.20 |
| bious3 | 0.1607 | 0.1812 | 0.1598 | 0.1794 | 4.09 | 3.72 | 3.87 | 4.18 |
| bious4 | 0.1543 | 0.1729 | 0.1526 | 0.1684 | 4.21 | 3.58 | 3.87 | 4.25 |
| bious5 | 0.1574 | 0.1783 | 0.1547 | 0.1746 | 4.27 | 3.65 | 3.82 | 4.29 |
| IR1 | 0.1359 | 0.1666 | 0.1304 | 0.1612 | 3.95 | 3.36 | 3.85 | 4.20 |
| Fleming-3 | 0.2345 | 0.1164 | 0.2535 | 0.1252 | 4.11 | 3.91 | 3.79 | 4.25 |
| qa | 0.1524 | 0.1510 | 0.1555 | 0.1525 | 3.94 | 3.76 | 3.95 | 4.31 |
| bious1 | 0.1777 | 0.1894 | 0.1735 | 0.1832 | 4.22 | 3.81 | 4.09 | 4.35 |
| deepseek-r1:32b | 0.2066 | 0.1586 | 0.2197 | 0.1670 | 4.47 | 3.87 | 4.13 | 4.49 |
| config-1 | 0.2287 | 0.1927 | 0.2330 | 0.1951 | 4.48 | 4.26 | 4.19 | 4.55 |
| config-2 | 0.2255 | 0.1873 | 0.2323 | 0.1909 | 4.29 | 4.09 | 4.09 | 4.41 |
| config-3 | 0.2284 | 0.1921 | 0.2350 | 0.1964 | 4.31 | 4.06 | 4.05 | 4.36 |
| config-4 | 0.2524 | 0.2067 | 0.2579 | 0.2091 | 4.41 | 4.21 | 4.18 | 4.40 |
| config-5 | 0.2418 | 0.2014 | 0.2475 | 0.2038 | 4.33 | 4.18 | 4.16 | 4.44 |
| mistral | 0.1793 | 0.1833 | 0.1809 | 0.1824 | 4.31 | 3.87 | 3.96 | 4.39 |
| UR-IW-2 | 0.1843 | 0.1574 | 0.1998 | 0.1672 | 4.39 | 4.13 | 4.13 | 4.45 |
| UR-IW-4 | 0.1937 | 0.1496 | 0.2080 | 0.1583 | 4.49 | 4.15 | 4.20 | 4.48 |
| dmiip2024 | 0.2312 | 0.2062 | 0.2344 | 0.2072 | 4.33 | 4.19 | 4.16 | 4.38 |
| dmiip2024_1 | 0.2311 | 0.2171 | 0.2318 | 0.2164 | 4.32 | 4.13 | 4.14 | 4.28 |
| deepseek-r1:14b | 0.2075 | 0.1646 | 0.2185 | 0.1715 | 4.46 | 3.80 | 4.06 | 4.53 |
| dmiip2024_2 | 0.1424 | 0.1883 | 0.1332 | 0.1774 | 4.29 | 4.00 | 4.20 | 4.41 |
| deepseek-r1:8b | 0.2262 | 0.1526 | 0.2396 | 0.1598 | 4.53 | 3.92 | 4.16 | 4.54 |
| gpt 01 mini | 0.2066 | 0.1586 | 0.2197 | 0.1670 | 4.47 | 3.87 | 4.13 | 4.49 |
| dmiip2024_3 | 0.1575 | 0.2016 | 0.1490 | 0.1920 | 4.04 | 3.71 | 4.04 | 4.18 |
| deepseek32b-full | 0.1736 | 0.1593 | 0.1763 | 0.1583 | 3.93 | 4.00 | 4.00 | 4.22 |
| lasigeBioTM | 0.1447 | 0.1455 | 0.1448 | 0.1450 | - | - | - | - |
Test batch 2
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| Only uses GPT-4o | 0.6471 | 0.6250 | 0.6667 | 0.6458 | 0.2963 | 0.2963 | 0.2963 | 0.2326 | 0.1944 | 0.2026 |
| NN_Persona_1 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.2222 | 0.2963 | 0.2593 | 0.1805 | 0.3159 | 0.2098 |
| NN_Persona_2 | 0.8235 | 0.8421 | 0.8000 | 0.8211 | 0.3333 | 0.4444 | 0.3796 | 0.2644 | 0.3590 | 0.2842 |
| NN_Persona_3 | 0.7647 | 0.8000 | 0.7143 | 0.7571 | 0.2963 | 0.4074 | 0.3519 | 0.2252 | 0.3095 | 0.2477 |
| Fleming-1 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.3333 | 0.4815 | 0.4012 | 0.2105 | 0.2757 | 0.2110 |
| UniTor_0 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.4815 | 0.4815 | 0.4815 | 0.3282 | 0.4626 | 0.3504 |
| UniTor_1 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.4815 | 0.4815 | 0.4815 | 0.3282 | 0.4626 | 0.3504 |
| UniTor_2 | 0.7647 | 0.8000 | 0.7143 | 0.7571 | 0.4074 | 0.4074 | 0.4074 | 0.3105 | 0.4043 | 0.3062 |
| UniTor_3 | 0.7647 | 0.8000 | 0.7143 | 0.7571 | 0.4074 | 0.4074 | 0.4074 | 0.3105 | 0.4043 | 0.3062 |
| Baseline top 20 | 0.8824 | 0.9091 | 0.8333 | 0.8712 | 0.4815 | 0.5556 | 0.5185 | 0.4171 | 0.3688 | 0.3647 |
| Main pipeline | 0.8824 | 0.9091 | 0.8333 | 0.8712 | 0.3704 | 0.4444 | 0.4012 | 0.2299 | 0.5347 | 0.2829 |
| Using LLM alone | 0.5882 | 0.5882 | 0.5882 | 0.5882 | 0.3333 | 0.3333 | 0.3333 | 0.2627 | 0.3236 | 0.2828 |
| Baseline top 10 | 0.8824 | 0.9091 | 0.8333 | 0.8712 | 0.4074 | 0.5185 | 0.4506 | 0.4065 | 0.3911 | 0.3758 |
| Using KG for list q | 0.8824 | 0.9091 | 0.8333 | 0.8712 | 0.3704 | 0.4444 | 0.4012 | 0.2341 | 0.5151 | 0.2796 |
| UR-IW-1 | 0.8824 | 0.9091 | 0.8333 | 0.8712 | 0.4074 | 0.5185 | 0.4506 | 0.2571 | 0.3234 | 0.2671 |
| UR-IW-2 | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.4815 | 0.5185 | 0.5000 | 0.3930 | 0.4097 | 0.3830 |
| UR-IW-3 | 0.9412 | 0.9474 | 0.9333 | 0.9404 | 0.4074 | 0.5556 | 0.4599 | 0.2031 | 0.2961 | 0.2283 |
| UR-IW-4 | 0.6471 | 0.6667 | 0.6250 | 0.6458 | 0.5185 | 0.5556 | 0.5370 | 0.3442 | 0.3496 | 0.3283 |
| NN_Baseline | 0.7059 | 0.7368 | 0.6667 | 0.7018 | 0.3333 | 0.3704 | 0.3519 | 0.2710 | 0.3136 | 0.2724 |
| bioinfo-1 | 0.5882 | 0.7407 | - | 0.3704 | - | - | - | - | - | - |
| bioinfo-2 | 0.5882 | 0.7407 | - | 0.3704 | - | - | - | - | - | - |
| bioinfo-3 | 0.5882 | 0.7407 | - | 0.3704 | - | - | - | - | - | - |
| bioinfo-4 | 0.5882 | 0.7407 | - | 0.3704 | - | - | - | - | - | - |
| bious3 | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.3704 | 0.4444 | 0.4074 | 0.1900 | 0.1550 | 0.1658 |
| Fleming-2 | 0.8824 | 0.9091 | 0.8333 | 0.8712 | 0.3333 | 0.4815 | 0.4012 | 0.2105 | 0.2757 | 0.2110 |
| UR-IW-5 | 0.9412 | 0.9524 | 0.9231 | 0.9377 | 0.3333 | 0.3704 | 0.3457 | 0.2139 | 0.3298 | 0.2406 |
| lasigeBioTM | 0.4118 | 0.4444 | 0.3750 | 0.4097 | 0.2593 | 0.2593 | 0.2593 | 0.4211 | 0.0865 | 0.1408 |
| dmiip2024 | 0.8235 | 0.8421 | 0.8000 | 0.8211 | 0.5185 | 0.5926 | 0.5556 | 0.2952 | 0.3708 | 0.3037 |
| dmiip2024_2 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.4815 | 0.4815 | 0.4815 | 0.2316 | 0.3712 | 0.2686 |
| dmiip2024_3 | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.4074 | 0.4444 | 0.4259 | 0.4233 | 0.3458 | 0.3698 |
| dmiip2024_4 | 0.7647 | 0.8182 | 0.6667 | 0.7424 | 0.6296 | 0.6296 | 0.6296 | 0.3147 | 0.3807 | 0.3282 |
| dmiip2024_1 | 0.8235 | 0.8421 | 0.8000 | 0.8211 | 0.4815 | 0.4815 | 0.4815 | 0.3077 | 0.3554 | 0.3015 |
| IR2 | 0.6471 | 0.6667 | 0.6250 | 0.6458 | 0.4815 | 0.4815 | 0.4815 | 0.2165 | 0.1892 | 0.1972 |
| deepseek32b-me | 0.7647 | 0.8333 | 0.6000 | 0.7167 | 0.3704 | 0.3704 | 0.3704 | 0.2445 | 0.3056 | 0.2437 |
| mistral | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.4444 | 0.5185 | 0.4722 | 0.2355 | 0.3071 | 0.2366 |
| gpt 01 mini | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.0741 | 0.1111 | 0.0926 | 0.2453 | 0.2599 | 0.2280 |
| bious4 | 0.8235 | 0.8696 | 0.7273 | 0.7984 | 0.4815 | 0.5185 | 0.5000 | 0.2532 | 0.2270 | 0.2341 |
| phaseB-5 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.3704 | 0.3704 | 0.3704 | 0.2526 | 0.2662 | 0.2443 |
| phaseB-4 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.3704 | 0.3704 | 0.3704 | 0.2526 | 0.2662 | 0.2443 |
| deepseek32b-f | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.3333 | 0.3333 | 0.3333 | 0.2563 | 0.2749 | 0.2483 |
| bious2 | 0.7647 | 0.7778 | 0.7500 | 0.7639 | 0.2963 | 0.3333 | 0.3148 | 0.2387 | 0.2358 | 0.2268 |
| bious1 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.4074 | 0.4074 | 0.4074 | 0.2144 | 0.1862 | 0.1976 |
| bious5 | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.3333 | 0.3704 | 0.3519 | 0.1978 | 0.2485 | 0.2023 |
| deepseek-r1:14b | 0.8824 | 0.9091 | 0.8333 | 0.8712 | 0.1481 | 0.1481 | 0.1481 | 0.1754 | 0.1737 | 0.1700 |
| deepseek-r1:8b | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.1111 | 0.1111 | 0.1111 | 0.3489 | 0.3167 | 0.3167 |
| deepseek-r1:32b | 0.8235 | 0.8421 | 0.8000 | 0.8211 | 0.1111 | 0.1111 | 0.1111 | 0.3726 | 0.3495 | 0.3400 |
| bioinfo-0 | 0.5882 | 0.7407 | - | 0.3704 | - | - | - | - | - | - |
| GPT4O | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.1111 | 0.1111 | 0.1111 | 0.3489 | 0.3167 | 0.3167 |
| deepseek32b-full | 0.8235 | 0.8696 | 0.7273 | 0.7984 | 0.3704 | 0.3704 | 0.3704 | 0.3382 | 0.3113 | 0.3067 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| Only uses GPT-4o | 0.2075 | 0.1990 | 0.2090 | 0.1963 | 4.56 | 3.60 | 4.25 | 4.66 |
| NN_Persona_1 | 0.2614 | 0.1502 | 0.2735 | 0.1520 | 4.25 | 4.16 | 3.93 | 4.44 |
| NN_Persona_2 | 0.2281 | 0.1508 | 0.2357 | 0.1524 | 4.19 | 4.21 | 4.05 | 4.29 |
| NN_Persona_3 | 0.2308 | 0.1505 | 0.2407 | 0.1502 | 4.13 | 4.19 | 3.94 | 4.32 |
| Fleming-1 | 0.2875 | 0.1248 | 0.3023 | 0.1300 | 4.00 | 3.92 | 3.81 | 4.29 |
| UniTor_0 | 0.1890 | 0.2020 | 0.1780 | 0.1901 | 4.32 | 3.58 | 4.05 | 4.54 |
| UniTor_1 | 0.1898 | 0.2038 | 0.1782 | 0.1912 | 4.32 | 3.60 | 4.05 | 4.54 |
| UniTor_2 | 0.1949 | 0.2124 | 0.1840 | 0.2003 | 4.42 | 3.72 | 4.16 | 4.61 |
| UniTor_3 | 0.1929 | 0.2106 | 0.1823 | 0.1988 | 4.42 | 3.73 | 4.15 | 4.61 |
| Baseline top 20 | 0.0368 | 0.0400 | 0.0367 | 0.0405 | 1.15 | 1.08 | 1.14 | 1.20 |
| Main pipeline | 0.0560 | 0.0469 | 0.0561 | 0.0468 | 1.20 | 1.21 | 1.13 | 1.20 |
| Using LLM alone | 0.0359 | 0.0372 | 0.0354 | 0.0376 | 1.24 | 0.98 | 1.08 | 1.25 |
| Baseline top 10 | 0.0351 | 0.0374 | 0.0355 | 0.0382 | 1.19 | 1.09 | 1.19 | 1.20 |
| Using KG for list q | 0.0560 | 0.0469 | 0.0561 | 0.0468 | 1.20 | 1.21 | 1.13 | 1.20 |
| UR-IW-1 | 0.2354 | 0.1680 | 0.2468 | 0.1721 | 4.38 | 3.95 | 4.00 | 4.46 |
| UR-IW-2 | 0.1931 | 0.1675 | 0.1989 | 0.1674 | 4.49 | 4.01 | 4.24 | 4.60 |
| UR-IW-3 | 0.2341 | 0.1882 | 0.2435 | 0.1910 | 4.38 | 4.11 | 4.11 | 4.56 |
| UR-IW-4 | 0.1927 | 0.1603 | 0.2042 | 0.1629 | 4.49 | 4.00 | 4.26 | 4.69 |
| NN_Baseline | 0.2205 | 0.1330 | 0.2375 | 0.1396 | 4.39 | 4.31 | 4.21 | 4.59 |
| bioinfo-1 | 0.1965 | 0.1646 | 0.1953 | 0.1626 | 4.34 | 4.14 | 4.31 | 4.53 |
| bioinfo-2 | 0.2328 | 0.1725 | 0.2357 | 0.1721 | 4.49 | 4.48 | 4.41 | 4.72 |
| bioinfo-3 | 0.2167 | 0.1569 | 0.2216 | 0.1587 | 4.42 | 4.26 | 4.29 | 4.56 |
| bioinfo-4 | 0.2277 | 0.1553 | 0.2288 | 0.1573 | 4.36 | 4.29 | 4.24 | 4.53 |
| bious3 | 0.1881 | 0.1965 | 0.1877 | 0.1937 | 4.49 | 3.55 | 4.04 | 4.60 |
| Fleming-2 | 0.2477 | 0.1563 | 0.2626 | 0.1613 | 4.31 | 4.00 | 4.02 | 4.52 |
| UR-IW-5 | 0.1983 | 0.1660 | 0.2089 | 0.1698 | 4.32 | 3.69 | 3.98 | 4.39 |
| lasigeBioTM | 0.1364 | 0.1374 | 0.1371 | 0.1370 | 3.40 | 3.42 | 3.80 | 4.24 |
| dmiip2024 | 0.1926 | 0.2261 | 0.1868 | 0.2186 | 4.44 | 3.94 | 4.36 | 4.56 |
| dmiip2024_2 | 0.1711 | 0.2074 | 0.1656 | 0.2003 | 4.29 | 3.76 | 4.20 | 4.55 |
| dmiip2024_3 | 0.1616 | 0.1953 | 0.1513 | 0.1831 | 4.59 | 3.88 | 4.46 | 4.65 |
| dmiip2024_4 | 0.1870 | 0.2233 | 0.1765 | 0.2109 | 4.45 | 3.96 | 4.34 | 4.59 |
| dmiip2024_1 | 0.1960 | 0.2309 | 0.1915 | 0.2234 | 4.45 | 3.88 | 4.28 | 4.55 |
| IR2 | 0.2211 | 0.2400 | 0.2132 | 0.2306 | 4.58 | 3.99 | 4.56 | 4.74 |
| deepseek32b-me | 0.1915 | 0.1456 | 0.1964 | 0.1445 | 4.09 | 4.07 | 4.05 | 4.61 |
| mistral | 0.2139 | 0.1679 | 0.2158 | 0.1677 | 4.33 | 3.94 | 3.91 | 4.49 |
| gpt 01 mini | 0.1249 | 0.1235 | 0.1313 | 0.1295 | 3.85 | 2.93 | 3.61 | 4.36 |
| bious4 | 0.2098 | 0.2139 | 0.2052 | 0.2068 | 4.54 | 4.05 | 4.32 | 4.64 |
| phaseB-5 | 0.2056 | 0.1565 | 0.2126 | 0.1583 | 4.01 | 4.13 | 4.18 | 4.58 |
| phaseB-4 | 0.2056 | 0.1565 | 0.2126 | 0.1583 | 4.01 | 4.13 | 4.18 | 4.58 |
| deepseek32b-f | 0.2072 | 0.1504 | 0.2098 | 0.1491 | 3.94 | 4.24 | 4.00 | 4.60 |
| bious2 | 0.1931 | 0.1972 | 0.1898 | 0.1919 | 4.44 | 3.81 | 4.21 | 4.60 |
| bious1 | 0.2105 | 0.2121 | 0.2057 | 0.2051 | 4.45 | 4.05 | 4.32 | 4.64 |
| bious5 | 0.2028 | 0.2107 | 0.1979 | 0.2040 | 4.49 | 3.85 | 4.21 | 4.55 |
| deepseek-r1:14b | 0.1847 | 0.1243 | 0.1886 | 0.1277 | 4.18 | 3.60 | 3.76 | 4.49 |
| deepseek-r1:8b | 0.2075 | 0.1418 | 0.2176 | 0.1461 | 4.41 | 3.68 | 4.14 | 4.56 |
| deepseek-r1:32b | 0.2277 | 0.1548 | 0.2353 | 0.1579 | 4.46 | 3.72 | 4.14 | 4.62 |
| bioinfo-0 | 0.2308 | 0.2272 | 0.2247 | 0.2162 | 4.46 | 4.20 | 4.39 | 4.56 |
| GPT4O | 0.2075 | 0.1418 | 0.2176 | 0.1461 | 4.41 | 3.68 | 4.14 | 4.56 |
| deepseek32b-full | 0.1804 | 0.1488 | 0.1771 | 0.1461 | 4.01 | 3.84 | 4.00 | 4.41 |
Test batch 3
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| Only uses GPT-4o | 0.8182 | 0.8667 | 0.7143 | 0.7905 | 0.3500 | 0.4000 | 0.3750 | 0.2644 | 0.2377 | 0.2346 |
| IR2 | 0.9545 | 0.9677 | 0.9231 | 0.9454 | 0.2000 | 0.2000 | 0.2000 | 0.3280 | 0.3152 | 0.3209 |
| IR3 | 0.8182 | 0.8824 | 0.6000 | 0.7412 | 0.3500 | 0.4500 | 0.4000 | 0.4556 | 0.4401 | 0.4265 |
| IR4 | 0.7273 | 0.8235 | 0.4000 | 0.6118 | 0.3000 | 0.4500 | 0.3667 | 0.3754 | 0.3722 | 0.3645 |
| NN_Persona_3 | 0.7727 | 0.8485 | 0.5455 | 0.6970 | 0.3500 | 0.4000 | 0.3750 | 0.2825 | 0.3983 | 0.3107 |
| NN_Baseline | 0.7273 | 0.7692 | 0.6667 | 0.7179 | 0.1500 | 0.2500 | 0.2000 | 0.4149 | 0.4220 | 0.4110 |
| NN_Persona_1 | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.2000 | 0.3000 | 0.2417 | 0.2532 | 0.3212 | 0.2744 |
| NN_Persona_2 | 0.9545 | 0.9655 | 0.9333 | 0.9494 | 0.4000 | 0.4500 | 0.4250 | 0.2773 | 0.3974 | 0.3108 |
| IR1 | 0.6818 | 0.7586 | 0.5333 | 0.6460 | 0.2000 | 0.2000 | 0.2000 | 0.3629 | 0.3136 | 0.3321 |
| UR-IW-2 | 0.8182 | 0.8750 | 0.6667 | 0.7708 | 0.2500 | 0.3500 | 0.3000 | 0.3902 | 0.3877 | 0.3685 |
| UR-IW-4 | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.2500 | 0.3000 | 0.2750 | 0.2423 | 0.2712 | 0.2419 |
| lasigeBioTM | 0.4091 | 0.3158 | 0.4800 | 0.3979 | 0.0500 | 0.0500 | 0.0500 | 0.0455 | 0.0076 | 0.0130 |
| lasigeBioTM-onto-sm | 0.5909 | 0.6400 | 0.5263 | 0.5832 | - | - | - | 0.0455 | 0.0227 | 0.0303 |
| Fleming-1 | 0.7273 | 0.8000 | 0.5714 | 0.6857 | 0.2500 | 0.4000 | 0.3125 | 0.3293 | 0.4159 | 0.3592 |
| Baseline top 10 | 0.8182 | 0.8750 | 0.6667 | 0.7708 | 0.2000 | 0.3000 | 0.2500 | 0.4421 | 0.4219 | 0.4238 |
| Using LLM alone | 0.7727 | 0.8387 | 0.6154 | 0.7270 | 0.3000 | 0.4000 | 0.3500 | 0.2159 | 0.2739 | 0.2344 |
| Using KG for list q | 0.8182 | 0.8750 | 0.6667 | 0.7708 | 0.3000 | 0.4500 | 0.3667 | 0.3279 | 0.4877 | 0.3642 |
| Main pipeline | 0.8182 | 0.8750 | 0.6667 | 0.7708 | 0.3000 | 0.4500 | 0.3667 | 0.3086 | 0.4506 | 0.3195 |
| Baseline top 20 | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.2500 | 0.3000 | 0.2750 | 0.4381 | 0.4377 | 0.4259 |
| UniTor_0 | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.3000 | 0.3500 | 0.3250 | 0.2702 | 0.2904 | 0.2735 |
| UniTor_1 | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.2000 | 0.2000 | 0.2000 | 0.2550 | 0.2768 | 0.2626 |
| UniTor_2 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.3000 | 0.3500 | 0.3167 | 0.2008 | 0.2273 | 0.2025 |
| UniTor_3 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.2000 | 0.2500 | 0.2167 | 0.1950 | 0.2514 | 0.2119 |
| IR5 | 0.7273 | 0.7692 | 0.6667 | 0.7179 | 0.1500 | 0.1500 | 0.1500 | 0.3221 | 0.2576 | 0.2722 |
| bious1 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.2000 | 0.2500 | 0.2250 | 0.3766 | 0.3735 | 0.3719 |
| bious2 | 0.8182 | 0.8571 | 0.7500 | 0.8036 | 0.2000 | 0.2000 | 0.2000 | 0.4206 | 0.3943 | 0.4032 |
| bious3 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.1500 | 0.1500 | 0.1500 | 0.3490 | 0.3343 | 0.3358 |
| bious4 | 0.8182 | 0.8571 | 0.7500 | 0.8036 | 0.1500 | 0.2000 | 0.1750 | 0.3940 | 0.3646 | 0.3710 |
| bious5 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.2000 | 0.2500 | 0.2250 | 0.3553 | 0.3282 | 0.3356 |
| UR-IW-1 | 0.8636 | 0.9091 | 0.7273 | 0.8182 | 0.2500 | 0.4500 | 0.3375 | 0.3495 | 0.4037 | 0.3587 |
| UR-IW-3 | 0.6818 | 0.7586 | 0.5333 | 0.6460 | 0.2500 | 0.5500 | 0.3600 | 0.3400 | 0.3630 | 0.3337 |
| UR-IW-5 | 0.9091 | 0.9375 | 0.8333 | 0.8854 | 0.1000 | 0.3500 | 0.2250 | 0.3855 | 0.4082 | 0.3776 |
| dmiip2024 | 0.8182 | 0.8750 | 0.6667 | 0.7708 | 0.3500 | 0.3500 | 0.3500 | 0.4520 | 0.4337 | 0.4323 |
| dmiip2024_1 | 0.8182 | 0.8667 | 0.7143 | 0.7905 | 0.2500 | 0.2500 | 0.2500 | 0.4680 | 0.4337 | 0.4366 |
| dmiip2024_2 | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.1500 | 0.2000 | 0.1750 | 0.3174 | 0.4413 | 0.3482 |
| dmiip2024_3 | 0.8182 | 0.8667 | 0.7143 | 0.7905 | 0.2000 | 0.4000 | 0.2917 | 0.5068 | 0.4246 | 0.4546 |
| dmiip2024_4 | 0.8182 | 0.8824 | 0.6000 | 0.7412 | 0.2000 | 0.2500 | 0.2250 | 0.4501 | 0.4204 | 0.4238 |
| lasigeBioTM-onto-bl | 0.5909 | 0.6400 | 0.5263 | 0.5832 | 0.1000 | 0.1000 | 0.1000 | 0.0909 | 0.0318 | 0.0455 |
| IRIS_1 | 0.3182 | - | 0.4828 | 0.2414 | - | - | - | - | - | - |
| IRIS_2 | 0.3182 | - | 0.4828 | 0.2414 | - | - | - | - | - | - |
| IRIS_3 | 0.3182 | - | 0.4828 | 0.2414 | - | - | - | - | - | - |
| extractive | 0.7273 | 0.7857 | 0.6250 | 0.7054 | 0.0500 | 0.2500 | 0.1417 | 0.2878 | 0.3136 | 0.2931 |
| mistral | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.1000 | 0.3500 | 0.2167 | 0.3527 | 0.3995 | 0.3436 |
| llama | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.2500 | 0.4500 | 0.3500 | 0.3317 | 0.3633 | 0.3393 |
| abstractive | 0.8636 | 0.8966 | 0.8000 | 0.8483 | 0.2500 | 0.3500 | 0.3000 | 0.0364 | 0.0606 | 0.0450 |
| bioinfo-0 | 0.6818 | 0.8108 | - | 0.4054 | - | - | - | - | - | - |
| bioinfo-1 | 0.6818 | 0.8108 | - | 0.4054 | - | - | - | - | - | - |
| bioinfo-2 | 0.6818 | 0.8108 | - | 0.4054 | - | - | - | - | - | - |
| bioinfo-3 | 0.6818 | 0.8108 | - | 0.4054 | - | - | - | - | - | - |
| bioinfo-4 | 0.6818 | 0.8108 | - | 0.4054 | - | - | - | - | - | - |
| dense | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.2000 | 0.3000 | 0.2500 | 0.3602 | 0.4054 | 0.3641 |
| sp_lasigebiotm | 0.3182 | - | 0.4828 | 0.2414 | - | - | - | 0.0455 | 0.0455 | 0.0455 |
| GPT4O | 0.7727 | 0.8485 | 0.5455 | 0.6970 | 0.1500 | 0.1500 | 0.1500 | 0.1569 | 0.1712 | 0.1623 |
| Fleming-2 | 0.8636 | 0.9032 | 0.7692 | 0.8362 | 0.3000 | 0.4500 | 0.3625 | 0.3293 | 0.4159 | 0.3592 |
| deepseek32b-me | 0.7273 | 0.7857 | 0.6250 | 0.7054 | 0.3000 | 0.3000 | 0.3000 | 0.0876 | 0.1167 | 0.0971 |
| deepseek-r1:14b | 0.8182 | 0.8667 | 0.7143 | 0.7905 | - | - | - | 0.1167 | 0.1545 | 0.1303 |
| deepseek32b-full | 0.7273 | 0.7857 | 0.6250 | 0.7054 | 0.3000 | 0.3000 | 0.3000 | 0.1239 | 0.1394 | 0.1296 |
| AQAMS | 0.8182 | 0.8667 | 0.7143 | 0.7905 | 0.1500 | 0.2000 | 0.1750 | 0.3666 | 0.3513 | 0.3526 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| Only uses GPT-4o | 0.1878 | 0.1790 | 0.1974 | 0.1795 | 4.54 | 3.65 | 4.08 | 4.59 |
| IR2 | 0.2197 | 0.2243 | 0.2168 | 0.2194 | 4.40 | 4.02 | 4.32 | 4.56 |
| IR3 | 0.2364 | 0.2380 | 0.2362 | 0.2342 | 4.42 | 3.99 | 4.35 | 4.49 |
| IR4 | 0.2353 | 0.2384 | 0.2339 | 0.2335 | 4.44 | 3.94 | 4.41 | 4.48 |
| NN_Persona_3 | 0.2090 | 0.1165 | 0.2328 | 0.1231 | 4.15 | 4.18 | 3.78 | 4.31 |
| NN_Baseline | 0.1846 | 0.1154 | 0.2087 | 0.1261 | 4.29 | 4.29 | 4.04 | 4.49 |
| NN_Persona_1 | 0.1954 | 0.1285 | 0.2134 | 0.1316 | 4.13 | 4.21 | 3.85 | 4.28 |
| NN_Persona_2 | 0.2027 | 0.1330 | 0.2191 | 0.1366 | 4.19 | 4.22 | 3.87 | 4.27 |
| IR1 | 0.1495 | 0.1708 | 0.1500 | 0.1679 | 4.13 | 3.58 | 4.16 | 4.39 |
| UR-IW-2 | 0.1335 | 0.1166 | 0.1546 | 0.1299 | 4.41 | 4.02 | 4.32 | 4.59 |
| UR-IW-4 | 0.1128 | 0.1006 | 0.1310 | 0.1140 | 4.52 | 3.89 | 4.25 | 4.61 |
| lasigeBioTM | - | - | - | - | - | - | - | - |
| lasigeBioTM-onto-sm | - | - | - | - | - | - | - | - |
| Fleming-1 | 0.2479 | 0.1105 | 0.2703 | 0.1200 | 3.99 | 3.80 | 3.69 | 4.04 |
| Baseline top 10 | 0.0391 | 0.0460 | 0.0397 | 0.0470 | 1.09 | 1.05 | 1.12 | 1.12 |
| Using LLM alone | 0.0298 | 0.0353 | 0.0306 | 0.0358 | 1.14 | 0.95 | 1.02 | 1.16 |
| Using KG for list q | 0.0504 | 0.0411 | 0.0545 | 0.0438 | 1.13 | 1.13 | 1.06 | 1.13 |
| Main pipeline | 0.0504 | 0.0411 | 0.0545 | 0.0438 | 1.13 | 1.13 | 1.06 | 1.13 |
| Baseline top 20 | 0.0417 | 0.0494 | 0.0445 | 0.0517 | 1.13 | 1.04 | 1.14 | 1.14 |
| UniTor_0 | 0.1472 | 0.1873 | 0.1422 | 0.1805 | 4.38 | 3.45 | 3.95 | 4.41 |
| UniTor_1 | 0.1519 | 0.1875 | 0.1489 | 0.1831 | 4.29 | 3.51 | 3.86 | 4.42 |
| UniTor_2 | 0.1549 | 0.1955 | 0.1493 | 0.1886 | 4.39 | 3.64 | 4.12 | 4.46 |
| UniTor_3 | 0.1517 | 0.1877 | 0.1469 | 0.1800 | 4.28 | 3.52 | 4.00 | 4.42 |
| IR5 | 0.1473 | 0.1681 | 0.1528 | 0.1670 | 4.08 | 3.47 | 4.04 | 4.33 |
| bious1 | 0.2114 | 0.1931 | 0.2116 | 0.1894 | 4.28 | 3.98 | 4.15 | 4.46 |
| bious2 | 0.1841 | 0.1850 | 0.1900 | 0.1861 | 4.40 | 3.89 | 4.12 | 4.51 |
| bious3 | 0.1867 | 0.1825 | 0.1947 | 0.1846 | 4.40 | 3.80 | 4.12 | 4.46 |
| bious4 | 0.1755 | 0.1782 | 0.1814 | 0.1788 | 4.36 | 3.69 | 3.99 | 4.47 |
| bious5 | 0.1852 | 0.1781 | 0.1915 | 0.1788 | 4.36 | 3.87 | 4.14 | 4.48 |
| UR-IW-1 | 0.2143 | 0.1433 | 0.2352 | 0.1497 | 4.27 | 4.01 | 3.95 | 4.36 |
| UR-IW-3 | 0.2285 | 0.1576 | 0.2397 | 0.1605 | 4.32 | 4.04 | 3.93 | 4.41 |
| UR-IW-5 | 0.2087 | 0.1871 | 0.2110 | 0.1850 | 4.49 | 4.06 | 4.11 | 4.51 |
| dmiip2024 | 0.1935 | 0.2230 | 0.1890 | 0.2157 | 4.39 | 3.95 | 4.32 | 4.53 |
| dmiip2024_1 | 0.2000 | 0.2245 | 0.1972 | 0.2186 | 4.35 | 3.99 | 4.25 | 4.49 |
| dmiip2024_2 | 0.1909 | 0.2181 | 0.1903 | 0.2160 | 4.25 | 3.84 | 4.24 | 4.41 |
| dmiip2024_3 | 0.1619 | 0.1986 | 0.1574 | 0.1911 | 4.41 | 3.89 | 4.40 | 4.51 |
| dmiip2024_4 | 0.1863 | 0.2158 | 0.1795 | 0.2069 | 4.33 | 3.87 | 4.25 | 4.51 |
| lasigeBioTM-onto-bl | - | - | - | - | - | - | - | - |
| IRIS_1 | 0.1514 | 0.1646 | 0.1572 | 0.1652 | 4.29 | 3.40 | 4.01 | 4.49 |
| IRIS_2 | 0.1541 | 0.1565 | 0.1558 | 0.1538 | 4.08 | 3.65 | 3.96 | 4.40 |
| IRIS_3 | 0.1633 | 0.1499 | 0.1676 | 0.1515 | 4.02 | 3.42 | 3.79 | 4.34 |
| extractive | 0.0456 | 0.0306 | 0.0504 | 0.0330 | 1.01 | 1.00 | 0.91 | 1.11 |
| mistral | 0.1927 | 0.1548 | 0.1989 | 0.1545 | 4.26 | 3.93 | 3.99 | 4.38 |
| llama | 0.1849 | 0.1545 | 0.1927 | 0.1533 | 4.26 | 3.96 | 3.98 | 4.33 |
| abstractive | 0.0488 | 0.0233 | 0.0550 | 0.0262 | 1.02 | 1.09 | 0.94 | 1.08 |
| bioinfo-0 | 0.1841 | 0.1735 | 0.1825 | 0.1697 | 4.38 | 4.12 | 4.35 | 4.55 |
| bioinfo-1 | 0.2225 | 0.1798 | 0.2226 | 0.1756 | 4.15 | 4.09 | 4.13 | 4.40 |
| bioinfo-2 | 0.1958 | 0.1463 | 0.2098 | 0.1529 | 4.32 | 4.29 | 4.28 | 4.47 |
| bioinfo-3 | 0.1943 | 0.1483 | 0.2064 | 0.1531 | 4.34 | 4.26 | 4.11 | 4.36 |
| bioinfo-4 | 0.2027 | 0.1429 | 0.2175 | 0.1465 | 4.22 | 4.34 | 4.04 | 4.42 |
| dense | 0.1861 | 0.1483 | 0.1982 | 0.1513 | 4.26 | 3.98 | 4.02 | 4.36 |
| sp_lasigebiotm | 0.0852 | 0.0878 | 0.0861 | 0.0871 | 3.11 | 2.04 | 2.45 | 3.76 |
| GPT4O | 0.0892 | 0.0962 | 0.0977 | 0.1043 | 3.78 | 2.91 | 3.44 | 4.18 |
| Fleming-2 | 0.2230 | 0.1349 | 0.2478 | 0.1424 | 4.18 | 4.06 | 3.98 | 4.38 |
| deepseek32b-me | 0.1408 | 0.1300 | 0.1531 | 0.1370 | 4.24 | 3.40 | 3.86 | 4.46 |
| deepseek-r1:14b | 0.0845 | 0.0940 | 0.0962 | 0.1034 | 3.85 | 2.84 | 3.65 | 4.34 |
| deepseek32b-full | 0.1502 | 0.1397 | 0.1635 | 0.1497 | 4.18 | 3.47 | 3.81 | 4.48 |
| AQAMS | 0.2275 | 0.1069 | 0.2514 | 0.1164 | 4.13 | 3.28 | 3.61 | 4.34 |
Test batch 4
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| IR1 | 0.8462 | 0.8824 | 0.7778 | 0.8301 | 0.4545 | 0.4545 | 0.4545 | 0.3386 | 0.2235 | 0.2572 |
| UniTor_0 | 0.8462 | 0.8824 | 0.7778 | 0.8301 | 0.4545 | 0.4545 | 0.4545 | 0.2691 | 0.2563 | 0.2503 |
| UniTor_1 | 0.8462 | 0.8824 | 0.7778 | 0.8301 | 0.4545 | 0.4545 | 0.4545 | 0.2539 | 0.2751 | 0.2483 |
| UniTor_2 | 0.8846 | 0.9189 | 0.8000 | 0.8595 | 0.4091 | 0.4091 | 0.4091 | 0.2710 | 0.2845 | 0.2465 |
| UniTor_3 | 0.8846 | 0.9189 | 0.8000 | 0.8595 | 0.4091 | 0.4091 | 0.4091 | 0.2746 | 0.2735 | 0.2591 |
| simple truncation | 0.8077 | 0.8649 | 0.6667 | 0.7658 | 0.2727 | 0.4545 | 0.3371 | 0.1775 | 0.2496 | 0.1985 |
| Using KG for list q | 0.9615 | 0.9714 | 0.9412 | 0.9563 | 0.4091 | 0.5000 | 0.4356 | 0.2215 | 0.3765 | 0.2533 |
| Baseline top 20 | 0.9615 | 0.9714 | 0.9412 | 0.9563 | 0.4091 | 0.4545 | 0.4318 | 0.4157 | 0.3339 | 0.3578 |
| Baseline top 10 | 0.9615 | 0.9714 | 0.9412 | 0.9563 | 0.4545 | 0.5455 | 0.5000 | 0.3721 | 0.3248 | 0.3307 |
| Using LLM alone | 0.8846 | 0.9091 | 0.8421 | 0.8756 | 0.4091 | 0.4091 | 0.4091 | 0.2785 | 0.3186 | 0.2837 |
| Main pipeline | 0.9231 | 0.9444 | 0.8750 | 0.9097 | 0.4091 | 0.5000 | 0.4356 | 0.2362 | 0.3901 | 0.2703 |
| IR5 | 0.6923 | 0.7500 | 0.6000 | 0.6750 | 0.5000 | 0.5000 | 0.5000 | 0.2522 | 0.2257 | 0.2313 |
| IR3 | 0.9615 | 0.9714 | 0.9412 | 0.9563 | 0.4091 | 0.4545 | 0.4318 | 0.4132 | 0.2353 | 0.2693 |
| IR4 | 0.9231 | 0.9412 | 0.8889 | 0.9150 | 0.3636 | 0.3636 | 0.3636 | 0.3242 | 0.2051 | 0.2329 |
| IR2 | 0.8846 | 0.9091 | 0.8421 | 0.8756 | 0.5000 | 0.5000 | 0.5000 | 0.3772 | 0.2254 | 0.2685 |
| AQAMS | 0.8462 | 0.8889 | 0.7500 | 0.8194 | 0.4091 | 0.4091 | 0.4091 | 0.3531 | 0.2891 | 0.3033 |
| kmeans | 0.6538 | 0.6897 | 0.6087 | 0.6492 | 0.0909 | 0.0909 | 0.0909 | 0.2383 | 0.2386 | 0.2097 |
| similarity measures | 0.8846 | 0.9143 | 0.8235 | 0.8689 | 0.4091 | 0.5455 | 0.4773 | 0.2253 | 0.3185 | 0.2202 |
| deepseek32b-me | 0.8077 | 0.8649 | 0.6667 | 0.7658 | 0.4545 | 0.4545 | 0.4545 | 0.3520 | 0.2364 | 0.2675 |
| deepseek32b-full | 0.7692 | 0.8333 | 0.6250 | 0.7292 | 0.4091 | 0.4091 | 0.4091 | 0.3222 | 0.2369 | 0.2529 |
| bious2 | 0.7692 | 0.8125 | 0.7000 | 0.7563 | 0.2727 | 0.3636 | 0.3182 | 0.2454 | 0.1819 | 0.1966 |
| bious3 | 0.8462 | 0.8750 | 0.8000 | 0.8375 | 0.4091 | 0.4545 | 0.4318 | 0.2999 | 0.2081 | 0.2283 |
| bious4 | 0.8077 | 0.8485 | 0.7368 | 0.7927 | 0.3636 | 0.4091 | 0.3864 | 0.2904 | 0.2350 | 0.2496 |
| bious5 | 0.8462 | 0.8750 | 0.8000 | 0.8375 | 0.3636 | 0.3636 | 0.3636 | 0.2624 | 0.1881 | 0.2027 |
| dmiip2024 | 0.8846 | 0.9189 | 0.8000 | 0.8595 | 0.5000 | 0.5455 | 0.5227 | 0.2799 | 0.2798 | 0.2695 |
| dmiip2024_1 | 0.9231 | 0.9412 | 0.8889 | 0.9150 | 0.4545 | 0.4545 | 0.4545 | 0.3102 | 0.2758 | 0.2848 |
| dmiip2024_2 | 0.8462 | 0.8889 | 0.7500 | 0.8194 | 0.2727 | 0.2727 | 0.2727 | 0.2669 | 0.3014 | 0.2707 |
| dmiip2024_4 | 0.8077 | 0.8718 | 0.6154 | 0.7436 | 0.4545 | 0.5000 | 0.4773 | 0.3289 | 0.2361 | 0.2647 |
| Fleming-2 | 0.8846 | 0.9143 | 0.8235 | 0.8689 | 0.2727 | 0.4545 | 0.3273 | 0.1469 | 0.3075 | 0.1858 |
| dmiip2024_3 | 0.8846 | 0.9189 | 0.8000 | 0.8595 | 0.5000 | 0.5909 | 0.5303 | 0.4053 | 0.2513 | 0.2911 |
| Fleming-1 | 0.8846 | 0.9143 | 0.8235 | 0.8689 | 0.2727 | 0.4091 | 0.3144 | 0.2471 | 0.3185 | 0.2632 |
| UR-IW-1 | 0.8462 | 0.8889 | 0.7500 | 0.8194 | 0.5000 | 0.5455 | 0.5152 | 0.2307 | 0.3274 | 0.2379 |
| UR-IW-2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4545 | 0.4545 | 0.4545 | 0.2378 | 0.3622 | 0.2644 |
| UR-IW-3 | 0.8462 | 0.8824 | 0.7778 | 0.8301 | 0.4545 | 0.4545 | 0.4545 | 0.3150 | 0.3261 | 0.3050 |
| UR-IW-5 | 0.8462 | 0.8947 | 0.7143 | 0.8045 | 0.5455 | 0.5909 | 0.5606 | 0.2984 | 0.3626 | 0.3057 |
| UR-IW-4 | 0.8846 | 0.9032 | 0.8571 | 0.8802 | 0.4091 | 0.4545 | 0.4242 | 0.2952 | 0.3048 | 0.2824 |
| Fleming-3 | 0.8462 | 0.8889 | 0.7500 | 0.8194 | 0.2727 | 0.4545 | 0.3273 | 0.1469 | 0.3075 | 0.1858 |
| extractive | 0.9231 | 0.9412 | 0.8889 | 0.9150 | 0.4545 | 0.5455 | 0.5000 | 0.2150 | 0.3385 | 0.2128 |
| abstractive | 0.9231 | 0.9412 | 0.8889 | 0.9150 | 0.4545 | 0.5455 | 0.5000 | 0.1848 | 0.3152 | 0.2235 |
| Only uses GPT-4o | 0.9231 | 0.9375 | 0.9000 | 0.9188 | 0.4545 | 0.4545 | 0.4545 | 0.2377 | 0.2098 | 0.2132 |
| NN_Persona_1 | 0.8462 | 0.8667 | 0.8182 | 0.8424 | 0.3636 | 0.4545 | 0.4091 | 0.2385 | 0.2875 | 0.2476 |
| NN_Persona_2 | 0.8077 | 0.8387 | 0.7619 | 0.8003 | 0.3636 | 0.4091 | 0.3864 | 0.1585 | 0.1871 | 0.1571 |
| NN_Persona_3 | 0.8077 | 0.8485 | 0.7368 | 0.7927 | 0.4545 | 0.5455 | 0.4750 | 0.1972 | 0.2568 | 0.2113 |
| NN_Baseline | 0.9231 | 0.9375 | 0.9000 | 0.9188 | 0.4091 | 0.4091 | 0.4091 | 0.2945 | 0.2412 | 0.2459 |
| GPT4O | 0.8077 | 0.8571 | 0.7059 | 0.7815 | 0.2727 | 0.3636 | 0.3182 | 0.2567 | 0.2443 | 0.2256 |
| deepseek-r1:32b | 0.8462 | 0.8889 | 0.7500 | 0.8194 | 0.2727 | 0.3636 | 0.3182 | 0.3132 | 0.2360 | 0.2324 |
| deepseek-r1:14b | 0.9231 | 0.9444 | 0.8750 | 0.9097 | 0.3636 | 0.3636 | 0.3636 | 0.2946 | 0.2166 | 0.2357 |
| deepseek-r1:8b | 0.9615 | 0.9714 | 0.9412 | 0.9563 | 0.3636 | 0.3636 | 0.3636 | 0.2996 | 0.2175 | 0.2410 |
| Fleming-4 | 0.8846 | 0.9143 | 0.8235 | 0.8689 | 0.2727 | 0.4091 | 0.3121 | 0.2485 | 0.3185 | 0.2643 |
| mistral | 0.8462 | 0.8889 | 0.7500 | 0.8194 | 0.4545 | 0.4545 | 0.4545 | 0.2508 | 0.3153 | 0.2629 |
| llama | 0.8846 | 0.9189 | 0.8000 | 0.8595 | 0.4091 | 0.4091 | 0.4091 | 0.2854 | 0.3087 | 0.2735 |
| sp_lasigebiotm | 0.8077 | 0.8387 | 0.7619 | 0.8003 | 0.3182 | 0.3182 | 0.3182 | 0.2189 | 0.0995 | 0.1309 |
| lasigeBioTM | 0.8462 | 0.8750 | 0.8000 | 0.8375 | 0.3636 | 0.3636 | 0.3636 | 0.1751 | 0.1373 | 0.1511 |
| lasigeBioTM-onto-bl | 0.8077 | 0.8485 | 0.7368 | 0.7927 | 0.3636 | 0.3636 | 0.3636 | 0.1998 | 0.1387 | 0.1553 |
| lasigeBioTM-onto-sm | 0.7692 | 0.8000 | 0.7273 | 0.7636 | 0.2273 | 0.2273 | 0.2273 | 0.1228 | 0.0697 | 0.0794 |
| bioinfo-0 | 0.6538 | 0.7907 | - | 0.3953 | - | - | - | - | - | - |
| bioinfo-1 | 0.6538 | 0.7907 | - | 0.3953 | - | - | - | - | - | - |
| deepseek32b-f | 0.9231 | 0.9412 | 0.8889 | 0.9150 | 0.3636 | 0.3636 | 0.3636 | 0.2519 | 0.2170 | 0.2020 |
| bioinfo-2 | 0.6538 | 0.7907 | - | 0.3953 | - | - | - | - | - | - |
| bioinfo-3 | 0.6538 | 0.7907 | - | 0.3953 | - | - | - | - | - | - |
| bioinfo-4 | 0.6538 | 0.7907 | - | 0.3953 | - | - | - | - | - | - |
| bious1 | 0.8846 | 0.9032 | 0.8571 | 0.8802 | 0.4545 | 0.4545 | 0.4545 | 0.2340 | 0.2290 | 0.2107 |
| Fleming-5 | 0.9231 | 0.9444 | 0.8750 | 0.9097 | 0.2727 | 0.4091 | 0.3121 | 0.2485 | 0.3185 | 0.2643 |
| phaseB-5 | 0.8846 | 0.9143 | 0.8235 | 0.8689 | 0.4091 | 0.4091 | 0.4091 | 0.2434 | 0.1856 | 0.1969 |
| gpt 01 mini | 0.8077 | 0.8571 | 0.7059 | 0.7815 | 0.2727 | 0.2727 | 0.2727 | 0.2504 | 0.2223 | 0.2118 |
| phaseB-4 | 0.9231 | 0.9444 | 0.8750 | 0.9097 | 0.4091 | 0.4091 | 0.4091 | 0.2771 | 0.2325 | 0.2403 |
| 3.PhaseB_System | 0.6538 | 0.7907 | - | 0.3953 | 0.1818 | 0.1818 | 0.1818 | 0.0132 | 0.0105 | 0.0117 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| IR1 | 0.1262 | 0.1470 | 0.1244 | 0.1438 | 3.87 | 3.35 | 3.84 | 4.05 |
| UniTor_0 | 0.1195 | 0.1502 | 0.1136 | 0.1429 | 4.08 | 3.47 | 3.89 | 4.18 |
| UniTor_1 | 0.1185 | 0.1489 | 0.1094 | 0.1380 | 4.01 | 3.40 | 3.82 | 4.13 |
| UniTor_2 | 0.1268 | 0.1564 | 0.1221 | 0.1506 | 4.19 | 3.54 | 4.02 | 4.28 |
| UniTor_3 | 0.1208 | 0.1503 | 0.1161 | 0.1427 | 4.00 | 3.36 | 3.75 | 4.16 |
| simple truncation | 0.0323 | 0.0233 | 0.0367 | 0.0260 | 0.84 | 0.81 | 0.75 | 0.89 |
| Using KG for list q | 0.0351 | 0.0310 | 0.0383 | 0.0339 | 0.93 | 0.88 | 0.87 | 0.93 |
| Baseline top 20 | 0.0235 | 0.0301 | 0.0248 | 0.0317 | 0.88 | 0.78 | 0.92 | 0.91 |
| Baseline top 10 | 0.0247 | 0.0313 | 0.0251 | 0.0316 | 0.88 | 0.75 | 0.87 | 0.89 |
| Using LLM alone | 0.0217 | 0.0277 | 0.0221 | 0.0288 | 0.88 | 0.73 | 0.87 | 0.89 |
| Main pipeline | 0.0351 | 0.0310 | 0.0383 | 0.0339 | 0.93 | 0.88 | 0.87 | 0.93 |
| IR5 | 0.1309 | 0.1479 | 0.1293 | 0.1463 | 4.11 | 3.46 | 3.94 | 4.26 |
| IR3 | 0.1721 | 0.1843 | 0.1737 | 0.1856 | 4.15 | 3.72 | 4.07 | 4.19 |
| IR4 | 0.1874 | 0.1960 | 0.1863 | 0.1941 | 4.11 | 3.74 | 4.13 | 4.27 |
| IR2 | 0.1623 | 0.1844 | 0.1594 | 0.1802 | 4.29 | 3.91 | 4.14 | 4.32 |
| AQAMS | 0.2225 | 0.1223 | 0.2473 | 0.1341 | 4.13 | 4.16 | 3.82 | 4.16 |
| kmeans | - | - | - | - | - | - | - | - |
| similarity measures | 0.0343 | 0.0252 | 0.0399 | 0.0288 | 0.85 | 0.78 | 0.74 | 0.89 |
| deepseek32b-me | 0.1075 | 0.1406 | 0.1006 | 0.1312 | 3.88 | 3.20 | 3.71 | 4.05 |
| deepseek32b-full | 0.0942 | 0.1305 | 0.0893 | 0.1242 | 3.79 | 3.13 | 3.69 | 4.04 |
| bious2 | 0.1565 | 0.1700 | 0.1584 | 0.1701 | 4.06 | 3.67 | 3.92 | 4.14 |
| bious3 | 0.1553 | 0.1674 | 0.1559 | 0.1660 | 4.09 | 3.76 | 3.94 | 4.18 |
| bious4 | 0.1538 | 0.1664 | 0.1555 | 0.1664 | 4.12 | 3.71 | 3.95 | 4.20 |
| bious5 | 0.1527 | 0.1615 | 0.1542 | 0.1605 | 4.18 | 3.75 | 3.98 | 4.21 |
| dmiip2024 | 0.1433 | 0.1699 | 0.1405 | 0.1658 | 4.22 | 3.72 | 4.05 | 4.26 |
| dmiip2024_1 | 0.1532 | 0.1700 | 0.1500 | 0.1656 | 4.20 | 3.86 | 4.06 | 4.24 |
| dmiip2024_2 | 0.1329 | 0.1690 | 0.1293 | 0.1636 | 4.16 | 3.61 | 3.96 | 4.22 |
| dmiip2024_4 | 0.1307 | 0.1673 | 0.1250 | 0.1583 | 4.25 | 3.56 | 4.07 | 4.35 |
| Fleming-2 | 0.1813 | 0.0900 | 0.2062 | 0.1014 | 3.91 | 4.16 | 3.55 | 4.14 |
| dmiip2024_3 | 0.1340 | 0.1735 | 0.1247 | 0.1627 | 4.16 | 3.61 | 3.99 | 4.22 |
| Fleming-1 | 0.2248 | 0.1136 | 0.2462 | 0.1233 | 4.02 | 4.16 | 3.69 | 4.20 |
| UR-IW-1 | 0.2005 | 0.1405 | 0.2175 | 0.1501 | 4.16 | 4.19 | 3.93 | 4.16 |
| UR-IW-2 | 0.1521 | 0.1330 | 0.1675 | 0.1445 | 4.42 | 4.27 | 4.18 | 4.51 |
| UR-IW-3 | 0.1859 | 0.1464 | 0.1993 | 0.1552 | 4.27 | 4.09 | 4.04 | 4.18 |
| UR-IW-5 | 0.1691 | 0.1462 | 0.1776 | 0.1539 | 4.13 | 3.98 | 3.86 | 4.19 |
| UR-IW-4 | 0.1227 | 0.1061 | 0.1416 | 0.1194 | 4.25 | 4.15 | 4.07 | 4.44 |
| Fleming-3 | 0.1974 | 0.1136 | 0.2188 | 0.1248 | 4.22 | 4.15 | 3.75 | 4.33 |
| extractive | 0.0360 | 0.0271 | 0.0414 | 0.0306 | 0.86 | 0.79 | 0.76 | 0.91 |
| abstractive | 0.0331 | 0.0239 | 0.0383 | 0.0273 | 0.81 | 0.76 | 0.73 | 0.86 |
| Only uses GPT-4o | 0.2110 | 0.1244 | 0.2357 | 0.1367 | 4.39 | 4.35 | 4.05 | 4.38 |
| NN_Persona_1 | 0.2099 | 0.0986 | 0.2388 | 0.1104 | 4.00 | 4.25 | 3.68 | 4.06 |
| NN_Persona_2 | 0.2053 | 0.1109 | 0.2319 | 0.1224 | 4.00 | 4.28 | 3.66 | 4.09 |
| NN_Persona_3 | 0.2088 | 0.0966 | 0.2345 | 0.1090 | 4.09 | 4.22 | 3.72 | 4.13 |
| NN_Baseline | 0.1861 | 0.1150 | 0.2116 | 0.1296 | 4.35 | 4.29 | 3.95 | 4.45 |
| GPT4O | 0.0777 | 0.0900 | 0.0863 | 0.0990 | 3.69 | 3.08 | 3.47 | 3.99 |
| deepseek-r1:32b | 0.0785 | 0.0908 | 0.0834 | 0.0963 | 3.85 | 3.14 | 3.60 | 4.01 |
| deepseek-r1:14b | 0.1959 | 0.1444 | 0.2073 | 0.1520 | 4.28 | 3.96 | 4.05 | 4.40 |
| deepseek-r1:8b | 0.1877 | 0.1384 | 0.2022 | 0.1475 | 4.24 | 3.94 | 3.95 | 4.31 |
| Fleming-4 | 0.1803 | 0.1009 | 0.1994 | 0.1108 | 4.05 | 4.14 | 3.64 | 4.24 |
| mistral | 0.2003 | 0.1463 | 0.2130 | 0.1518 | 4.16 | 4.15 | 3.95 | 4.27 |
| llama | 0.1661 | 0.1421 | 0.1721 | 0.1450 | 4.25 | 3.95 | 3.81 | 4.29 |
| sp_lasigebiotm | 0.1381 | 0.1069 | 0.1591 | 0.1212 | 4.00 | 3.60 | 3.66 | 4.13 |
| lasigeBioTM | 0.1645 | 0.1058 | 0.1908 | 0.1211 | 3.85 | 3.73 | 3.59 | 4.01 |
| lasigeBioTM-onto-bl | 0.1660 | 0.1132 | 0.1930 | 0.1290 | 4.13 | 3.72 | 3.60 | 4.21 |
| lasigeBioTM-onto-sm | 0.1032 | 0.1124 | 0.1020 | 0.1102 | 3.64 | 3.01 | 3.52 | 3.96 |
| bioinfo-0 | 0.1435 | 0.1444 | 0.1441 | 0.1435 | 4.07 | 3.69 | 3.96 | 4.25 |
| bioinfo-1 | 0.1482 | 0.1675 | 0.1427 | 0.1592 | 4.08 | 3.86 | 4.00 | 4.19 |
| deepseek32b-f | 0.1621 | 0.1231 | 0.1681 | 0.1242 | 4.19 | 4.18 | 3.80 | 4.32 |
| bioinfo-2 | 0.1729 | 0.1300 | 0.1899 | 0.1423 | 4.11 | 4.14 | 3.93 | 4.24 |
| bioinfo-3 | 0.1435 | 0.1550 | 0.1436 | 0.1528 | 4.21 | 3.88 | 4.08 | 4.32 |
| bioinfo-4 | 0.1679 | 0.1194 | 0.1852 | 0.1312 | 4.24 | 4.19 | 3.95 | 4.31 |
| bious1 | 0.1742 | 0.1759 | 0.1753 | 0.1739 | 4.24 | 3.94 | 3.96 | 4.32 |
| Fleming-5 | 0.1803 | 0.1009 | 0.1994 | 0.1108 | 4.05 | 4.14 | 3.64 | 4.24 |
| phaseB-5 | 0.1487 | 0.1268 | 0.1502 | 0.1261 | 4.19 | 4.22 | 3.78 | 4.35 |
| gpt 01 mini | 0.0894 | 0.0993 | 0.0976 | 0.1078 | 3.81 | 3.19 | 3.58 | 3.99 |
| phaseB-4 | 0.1727 | 0.1357 | 0.1816 | 0.1399 | 4.24 | 4.29 | 3.91 | 4.44 |
| 3.PhaseB_System | 0.0594 | 0.0828 | 0.0624 | 0.0855 | - | - | - | - |