BioASQ Participants Area
Task 14b: Test Results of Phase A+
The test results are presented in separate tables for each type of annotation. The "System Description" of each system is used.The evaluation measures that are used in Task A+ are presented here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.
Test batch 1
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| DSGT | 0.5294 | 0.6923 | - | 0.3462 | - | - | - | 0.0048 | 0.0095 | 0.0063 |
| dictycite-max | 0.9412 | 0.9524 | 0.9231 | 0.9377 | 0.4348 | 0.4348 | 0.4348 | 0.2103 | 0.2066 | 0.2018 |
| dictycite-baseline | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.3043 | 0.4348 | 0.3696 | 0.2284 | 0.1859 | 0.1900 |
| dictycite-snippet | 0.9412 | 0.9524 | 0.9231 | 0.9377 | 0.3043 | 0.3913 | 0.3478 | 0.2274 | 0.2053 | 0.1923 |
| pancras_naive | 0.8824 | 0.8889 | 0.8750 | 0.8819 | 0.2174 | 0.2609 | 0.2391 | 0.1164 | 0.1452 | 0.1239 |
| dictycite-max-rew-si | 0.9412 | 0.9524 | 0.9231 | 0.9377 | 0.3043 | 0.3043 | 0.3043 | 0.2133 | 0.2066 | 0.2024 |
| dictycite-max-rew-sl | 0.9412 | 0.9524 | 0.9231 | 0.9377 | 0.3043 | 0.3043 | 0.3043 | 0.1813 | 0.1887 | 0.1805 |
| pancras_crag | 0.9412 | 0.9474 | 0.9333 | 0.9404 | 0.2609 | 0.3913 | 0.3188 | 0.1229 | 0.1752 | 0.1375 |
| "RMC_1" | 0.7647 | 0.7778 | 0.7500 | 0.7639 | - | - | - | 0.0714 | 0.0532 | 0.0540 |
| RMC_4 | 0.6471 | 0.6667 | 0.6250 | 0.6458 | 0.0870 | 0.0870 | 0.0870 | 0.1333 | 0.0802 | 0.0972 |
| RMC_5 | 0.7647 | 0.7778 | 0.7500 | 0.7639 | 0.2174 | 0.2174 | 0.2174 | 0.1222 | 0.1009 | 0.1021 |
| RMC_2 | 0.5882 | 0.4615 | 0.6667 | 0.5641 | 0.1739 | 0.1739 | 0.1739 | 0.1365 | 0.0831 | 0.1001 |
| Organization name | 0.7647 | 0.7778 | 0.7500 | 0.7639 | 0.2174 | 0.2174 | 0.2174 | 0.1349 | 0.0739 | 0.0821 |
| DS@GT-BioASQ | 0.6471 | 0.6667 | 0.6250 | 0.6458 | 0.0435 | 0.0435 | 0.0435 | 0.2024 | 0.1148 | 0.1402 |
| Biomedical QA system | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.1304 | 0.1739 | 0.1449 | 0.1300 | 0.1648 | 0.1328 |
| Biomedical QA s. v2 | 0.8824 | 0.8889 | 0.8750 | 0.8819 | 0.0870 | 0.1739 | 0.1232 | 0.0927 | 0.0967 | 0.0883 |
| dmiip2024 | 0.8235 | 0.8421 | 0.8000 | 0.8211 | 0.1739 | 0.2174 | 0.1957 | 0.2135 | 0.1391 | 0.1544 |
| dmiip2024_2 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.3043 | 0.4348 | 0.3696 | 0.2762 | 0.2187 | 0.2257 |
| dmiip2024_3 | 0.9412 | 0.9524 | 0.9231 | 0.9377 | 0.2174 | 0.3478 | 0.2826 | 0.2373 | 0.1981 | 0.1971 |
| dmiip2024_4 | 0.9412 | 0.9474 | 0.9333 | 0.9404 | 0.1739 | 0.2174 | 0.1957 | 0.2238 | 0.1295 | 0.1558 |
| dmiip2024_1 | 0.8824 | 0.8889 | 0.8750 | 0.8819 | 0.4348 | 0.4348 | 0.4348 | 0.2627 | 0.1796 | 0.1966 |
| Fleming-1 | 0.8824 | 0.9091 | 0.8333 | 0.8712 | 0.2609 | 0.4348 | 0.3275 | 0.1729 | 0.2512 | 0.1930 |
| ASK-Skill2 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.1739 | 0.1739 | 0.1739 | 0.1145 | 0.2503 | 0.1486 |
| Fleming-2 | 0.8235 | 0.8421 | 0.8000 | 0.8211 | 0.3043 | 0.3478 | 0.3130 | 0.1808 | 0.2666 | 0.1944 |
| ASK-Baseline | 0.8824 | 0.8889 | 0.8750 | 0.8819 | 0.1739 | 0.1739 | 0.1739 | 0.1124 | 0.1514 | 0.1244 |
| ASK-Tool1 | 0.8824 | 0.8889 | 0.8750 | 0.8819 | 0.1739 | 0.1739 | 0.1739 | 0.0699 | 0.1413 | 0.0825 |
| ASK-Tool2 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.1739 | 0.1739 | 0.1739 | 0.1397 | 0.1851 | 0.1423 |
| Biomedical QA s3 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.0870 | 0.1304 | 0.0978 | 0.1512 | 0.1592 | 0.1373 |
| IR_J-1 | 0.7647 | 0.8333 | 0.6000 | 0.7167 | 0.2609 | 0.2609 | 0.2609 | 0.1734 | 0.1344 | 0.1386 |
| IR_J-2 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.2609 | 0.2609 | 0.2609 | 0.1283 | 0.1571 | 0.1203 |
| IR_J-3 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.3478 | 0.3913 | 0.3623 | 0.2275 | 0.2002 | 0.1837 |
| IR_J-4 | 0.8824 | 0.8889 | 0.8750 | 0.8819 | 0.3043 | 0.3913 | 0.3406 | 0.1453 | 0.2011 | 0.1499 |
| IR_J-5 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.3043 | 0.3043 | 0.3043 | 0.1318 | 0.1915 | 0.1355 |
| IR_Y-3 | 0.8235 | 0.8421 | 0.8000 | 0.8211 | 0.0870 | 0.0870 | 0.0870 | 0.0317 | 0.0385 | 0.0329 |
| IR_Y-4 | 0.7059 | 0.7368 | 0.6667 | 0.7018 | 0.0000 | 0.0870 | 0.0435 | 0.0365 | 0.0425 | 0.0372 |
| IR_Y-5 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.0870 | 0.1304 | 0.1087 | 0.0351 | 0.0465 | 0.0388 |
| WHU-ECHO Lab-1 | 0.9412 | 0.9524 | 0.9231 | 0.9377 | 0.2174 | 0.2609 | 0.2319 | 0.1495 | 0.1717 | 0.1550 |
| UR-IW-1 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.2174 | 0.3913 | 0.2935 | 0.1241 | 0.2840 | 0.1628 |
| UR-IW-2 | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.2609 | 0.3043 | 0.2826 | 0.1968 | 0.3006 | 0.2226 |
| UR-IW-3 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.2609 | 0.3478 | 0.2971 | 0.1370 | 0.1907 | 0.1500 |
| UR-IW-4 | 0.8235 | 0.8571 | 0.7692 | 0.8132 | 0.1739 | 0.2609 | 0.2101 | 0.2385 | 0.2621 | 0.2345 |
| UR-IW-5 | 0.8235 | 0.8421 | 0.8000 | 0.8211 | 0.1304 | 0.2609 | 0.1812 | 0.1679 | 0.2323 | 0.1802 |
| IR_Y-1 | 0.5882 | 0.7407 | - | 0.3704 | 0.0000 | 0.0435 | 0.0109 | 0.0429 | 0.0576 | 0.0478 |
| IR_Y-2 | 0.5882 | 0.7407 | - | 0.3704 | 0.0000 | 0.0435 | 0.0217 | 0.0571 | 0.0806 | 0.0648 |
| MedQA-1 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.4348 | 0.4348 | 0.4348 | 0.2267 | 0.2001 | 0.2010 |
| MedQA-2 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.4348 | 0.4348 | 0.4348 | 0.2214 | 0.1814 | 0.1941 |
| bioinfo-0 | 0.8235 | 0.8421 | 0.8000 | 0.8211 | 0.2609 | 0.3043 | 0.2696 | 0.1275 | 0.2219 | 0.1509 |
| bioinfo-1 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.3043 | 0.3478 | 0.3261 | 0.1461 | 0.2485 | 0.1735 |
| MedQA-3 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.4348 | 0.4348 | 0.4348 | 0.2214 | 0.1814 | 0.1941 |
| bioinfo-2 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.3043 | 0.3478 | 0.3261 | 0.1534 | 0.2334 | 0.1731 |
| MedQA-4 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.4783 | 0.4783 | 0.4783 | 0.2232 | 0.2209 | 0.2140 |
| bioinfo-3 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.2609 | 0.2609 | 0.2609 | 0.1461 | 0.2485 | 0.1735 |
| bioinfo-4 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.2609 | 0.2609 | 0.2609 | 0.1534 | 0.2334 | 0.1731 |
| MedQA-5 | 0.8824 | 0.9000 | 0.8571 | 0.8786 | 0.4348 | 0.4348 | 0.4348 | 0.2209 | 0.2158 | 0.2127 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| DSGT | 0.0652 | 0.0435 | 0.1040 | 0.0658 | - | - | - | - |
| dictycite-max | 0.2340 | 0.1322 | 0.2553 | 0.1425 | - | - | - | - |
| dictycite-baseline | 0.2159 | 0.1206 | 0.2263 | 0.1273 | - | - | - | - |
| dictycite-snippet | 0.2162 | 0.1235 | 0.2263 | 0.1306 | - | - | - | - |
| pancras_naive | 0.1243 | 0.1061 | 0.1343 | 0.1081 | - | - | - | - |
| dictycite-max-rew-si | 0.2383 | 0.1351 | 0.2557 | 0.1419 | - | - | - | - |
| dictycite-max-rew-sl | 0.2425 | 0.1354 | 0.2590 | 0.1420 | - | - | - | - |
| pancras_crag | 0.1236 | 0.1025 | 0.1327 | 0.1056 | - | - | - | - |
| "RMC_1" | 0.1327 | 0.1235 | 0.1494 | 0.1331 | - | - | - | - |
| RMC_4 | 0.1538 | 0.1294 | 0.1623 | 0.1331 | - | - | - | - |
| RMC_5 | 0.1396 | 0.1259 | 0.1562 | 0.1365 | - | - | - | - |
| RMC_2 | 0.1384 | 0.1429 | 0.1541 | 0.1497 | - | - | - | - |
| Organization name | 0.1785 | 0.0943 | 0.2015 | 0.1028 | - | - | - | - |
| DS@GT-BioASQ | 0.1718 | 0.1183 | 0.1863 | 0.1284 | - | - | - | - |
| Biomedical QA system | 0.2386 | 0.1005 | 0.2659 | 0.1107 | - | - | - | - |
| Biomedical QA s. v2 | 0.2182 | 0.1097 | 0.2311 | 0.1132 | - | - | - | - |
| dmiip2024 | 0.1969 | 0.1595 | 0.2009 | 0.1576 | - | - | - | - |
| dmiip2024_2 | 0.1937 | 0.1674 | 0.2103 | 0.1746 | - | - | - | - |
| dmiip2024_3 | 0.1816 | 0.1767 | 0.1955 | 0.1821 | - | - | - | - |
| dmiip2024_4 | 0.1731 | 0.1409 | 0.1799 | 0.1414 | - | - | - | - |
| dmiip2024_1 | 0.1505 | 0.1477 | 0.1645 | 0.1551 | - | - | - | - |
| Fleming-1 | 0.2641 | 0.0903 | 0.2783 | 0.0956 | - | - | - | - |
| ASK-Skill2 | 0.2465 | 0.0797 | 0.2780 | 0.0901 | - | - | - | - |
| Fleming-2 | 0.2725 | 0.0909 | 0.2949 | 0.0994 | - | - | - | - |
| ASK-Baseline | 0.0831 | 0.1218 | 0.0664 | 0.1014 | - | - | - | - |
| ASK-Tool1 | 0.2497 | 0.0889 | 0.2764 | 0.0986 | - | - | - | - |
| ASK-Tool2 | 0.2590 | 0.0924 | 0.2863 | 0.1019 | - | - | - | - |
| Biomedical QA s3 | 0.2345 | 0.1001 | 0.2680 | 0.1096 | - | - | - | - |
| IR_J-1 | - | - | - | - | - | - | - | - |
| IR_J-2 | 0.1705 | 0.1041 | 0.1878 | 0.1088 | - | - | - | - |
| IR_J-3 | 0.1625 | 0.0978 | 0.1852 | 0.1017 | - | - | - | - |
| IR_J-4 | 0.1506 | 0.0965 | 0.1687 | 0.1004 | - | - | - | - |
| IR_J-5 | 0.1758 | 0.1261 | 0.1919 | 0.1302 | - | - | - | - |
| IR_Y-3 | 0.0040 | 0.0047 | 0.0038 | 0.0050 | - | - | - | - |
| IR_Y-4 | 0.0103 | 0.0137 | 0.0097 | 0.0132 | - | - | - | - |
| IR_Y-5 | 0.1045 | 0.1337 | 0.0989 | 0.1294 | - | - | - | - |
| WHU-ECHO Lab-1 | 0.1811 | 0.0734 | 0.1981 | 0.0776 | - | - | - | - |
| UR-IW-1 | 0.2344 | 0.1060 | 0.2529 | 0.1110 | - | - | - | - |
| UR-IW-2 | 0.2107 | 0.1347 | 0.2333 | 0.1422 | - | - | - | - |
| UR-IW-3 | 0.1692 | 0.1057 | 0.1884 | 0.1140 | - | - | - | - |
| UR-IW-4 | 0.1607 | 0.1416 | 0.1741 | 0.1494 | - | - | - | - |
| UR-IW-5 | 0.1791 | 0.1416 | 0.2023 | 0.1512 | - | - | - | - |
| IR_Y-1 | 0.1543 | 0.1237 | 0.1716 | 0.1340 | - | - | - | - |
| IR_Y-2 | 0.1692 | 0.1592 | 0.1833 | 0.1671 | - | - | - | - |
| MedQA-1 | 0.2070 | 0.1519 | 0.2292 | 0.1590 | - | - | - | - |
| MedQA-2 | 0.2089 | 0.1516 | 0.2299 | 0.1572 | - | - | - | - |
| bioinfo-0 | 0.2150 | 0.1021 | 0.2340 | 0.1094 | - | - | - | - |
| bioinfo-1 | 0.2200 | 0.1009 | 0.2393 | 0.1074 | - | - | - | - |
| MedQA-3 | 0.2070 | 0.1488 | 0.2295 | 0.1552 | - | - | - | - |
| bioinfo-2 | 0.2408 | 0.0825 | 0.2706 | 0.0932 | - | - | - | - |
| MedQA-4 | 0.2370 | 0.1513 | 0.2529 | 0.1532 | - | - | - | - |
| bioinfo-3 | 0.2245 | 0.0955 | 0.2489 | 0.1021 | - | - | - | - |
| bioinfo-4 | 0.2348 | 0.0916 | 0.2664 | 0.1023 | - | - | - | - |
| MedQA-5 | 0.2279 | 0.1525 | 0.2435 | 0.1533 | - | - | - | - |
Test batch 2
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| ASK-Baseline | 0.8095 | 0.8571 | 0.7143 | 0.7857 | 0.3000 | 0.3500 | 0.3250 | 0.1654 | 0.4030 | 0.2026 |
| ASK-Tool1 | 0.8095 | 0.8462 | 0.7500 | 0.7981 | 0.3000 | 0.3000 | 0.3000 | 0.1577 | 0.3955 | 0.2061 |
| ASK-Tool2 | 0.9048 | 0.9333 | 0.8333 | 0.8833 | 0.3000 | 0.4000 | 0.3417 | 0.1119 | 0.2890 | 0.1461 |
| IR1 | 0.8095 | 0.8462 | 0.7500 | 0.7981 | 0.3000 | 0.3000 | 0.3000 | 0.4177 | 0.3120 | 0.3280 |
| Biomedical QA system | 0.9048 | 0.9286 | 0.8571 | 0.8929 | 0.1000 | 0.2500 | 0.1625 | 0.2531 | 0.2725 | 0.2596 |
| dictycite-baseline | 0.9048 | 0.9286 | 0.8571 | 0.8929 | 0.3000 | 0.3000 | 0.3000 | 0.3821 | 0.3905 | 0.3762 |
| dictycite-snippet | 0.8571 | 0.8966 | 0.7692 | 0.8329 | 0.2500 | 0.3000 | 0.2750 | 0.3833 | 0.3799 | 0.3584 |
| dictycite-max | 0.9524 | 0.9655 | 0.9231 | 0.9443 | 0.2500 | 0.2500 | 0.2500 | 0.3324 | 0.3905 | 0.3504 |
| dictycite-max-rew-si | 0.9524 | 0.9655 | 0.9231 | 0.9443 | 0.2500 | 0.3000 | 0.2750 | 0.3111 | 0.3777 | 0.3251 |
| dictycite-max-rew-sl | 0.9524 | 0.9655 | 0.9231 | 0.9443 | 0.2500 | 0.2500 | 0.2500 | 0.3671 | 0.3777 | 0.3629 |
| lean_rag | 0.8571 | 0.8966 | 0.7692 | 0.8329 | 0.2500 | 0.3000 | 0.2667 | 0.3269 | 0.2581 | 0.2721 |
| ASK-Skill2 | 0.9048 | 0.9333 | 0.8333 | 0.8833 | 0.3000 | 0.3000 | 0.3000 | 0.1743 | 0.2987 | 0.1976 |
| Biomedical QA s. v2 | 0.9048 | 0.9286 | 0.8571 | 0.8929 | 0.0500 | 0.2500 | 0.1350 | 0.2568 | 0.2918 | 0.2693 |
| asmalltrialsystem | 0.7619 | 0.7826 | 0.7368 | 0.7597 | 0.1500 | 0.1500 | 0.1500 | 0.1077 | 0.0897 | 0.0950 |
| lean_rag_ft | 0.8571 | 0.8966 | 0.7692 | 0.8329 | 0.3000 | 0.3000 | 0.3000 | 0.3661 | 0.2639 | 0.2772 |
| lean_rag_ft_sparse | 0.8571 | 0.8966 | 0.7692 | 0.8329 | 0.2500 | 0.2500 | 0.2500 | 0.3635 | 0.2639 | 0.2766 |
| lean_rag_dprf | 0.8095 | 0.8667 | 0.6667 | 0.7667 | 0.3000 | 0.3000 | 0.3000 | 0.2755 | 0.2511 | 0.2544 |
| ASK-Skill1 | 0.9524 | 0.9655 | 0.9231 | 0.9443 | 0.3000 | 0.3000 | 0.3000 | 0.1692 | 0.2527 | 0.1875 |
| "RMC_1" | 0.7619 | 0.8148 | 0.6667 | 0.7407 | 0.1000 | 0.1000 | 0.1000 | 0.1385 | 0.1454 | 0.1305 |
| RMC_2 | 0.8095 | 0.8462 | 0.7500 | 0.7981 | 0.1000 | 0.1000 | 0.1000 | 0.1436 | 0.1064 | 0.1170 |
| RMC_3 | 0.7619 | 0.8148 | 0.6667 | 0.7407 | 0.1000 | 0.1000 | 0.1000 | 0.1385 | 0.1454 | 0.1305 |
| RMC_4 | 0.7619 | 0.8000 | 0.7059 | 0.7529 | - | - | - | 0.1538 | 0.1699 | 0.1486 |
| RMC_5 | 0.8571 | 0.8889 | 0.8000 | 0.8444 | 0.1000 | 0.1000 | 0.1000 | 0.3051 | 0.2490 | 0.2610 |
| Organization name | 0.7619 | 0.7826 | 0.7368 | 0.7597 | 0.2000 | 0.2000 | 0.2000 | 0.1649 | 0.1234 | 0.1275 |
| DS@GT-BioASQ | 0.6667 | 0.8000 | - | 0.4000 | 0.1500 | 0.1500 | 0.1500 | 0.2674 | 0.1442 | 0.1707 |
| mckpt2 | 0.9524 | 0.9630 | 0.9333 | 0.9481 | 0.2000 | 0.2000 | 0.2000 | 0.1605 | 0.1496 | 0.1537 |
| UR-IW-1 | 0.9048 | 0.9333 | 0.8333 | 0.8833 | 0.3500 | 0.3500 | 0.3500 | 0.2322 | 0.4558 | 0.2883 |
| UR-IW-2 | 0.9048 | 0.9333 | 0.8333 | 0.8833 | 0.3500 | 0.3500 | 0.3500 | 0.2608 | 0.3425 | 0.2722 |
| UR-IW-3 | 0.9524 | 0.9655 | 0.9231 | 0.9443 | 0.3500 | 0.3500 | 0.3500 | 0.2862 | 0.3361 | 0.2939 |
| UR-IW-4 | 0.9048 | 0.9286 | 0.8571 | 0.8929 | 0.3000 | 0.3000 | 0.3000 | 0.3406 | 0.2912 | 0.3017 |
| UR-IW-5 | 0.9524 | 0.9630 | 0.9333 | 0.9481 | 0.1500 | 0.1500 | 0.1500 | 0.1833 | 0.2687 | 0.2028 |
| IR_Y-1 | 0.8095 | 0.8667 | 0.6667 | 0.7667 | 0.0500 | 0.2000 | 0.0975 | 0.0462 | 0.0620 | 0.0507 |
| IR_Y-3 | 0.8571 | 0.8966 | 0.7692 | 0.8329 | 0.2500 | 0.3500 | 0.3000 | 0.0635 | 0.0662 | 0.0543 |
| IR_Y-4 | 0.8571 | 0.8966 | 0.7692 | 0.8329 | 0.3000 | 0.3500 | 0.3250 | 0.0641 | 0.0662 | 0.0543 |
| IR_Y-5 | 0.8095 | 0.8571 | 0.7143 | 0.7857 | 0.1000 | 0.2500 | 0.1750 | 0.0605 | 0.0748 | 0.0627 |
| IR_Y-2 | 0.8571 | 0.8966 | 0.7692 | 0.8329 | 0.0000 | 0.3000 | 0.0775 | 0.0205 | 0.0449 | 0.0272 |
| MedQA-1 | 0.9524 | 0.9655 | 0.9231 | 0.9443 | 0.3500 | 0.4500 | 0.4000 | 0.3533 | 0.3777 | 0.3551 |
| MedQA-2 | 0.9524 | 0.9655 | 0.9231 | 0.9443 | 0.3000 | 0.5000 | 0.3850 | 0.3629 | 0.3777 | 0.3617 |
| mckpt1 | 0.8571 | 0.9032 | 0.7273 | 0.8152 | 0.2500 | 0.3000 | 0.2667 | 0.3049 | 0.2993 | 0.2939 |
| Biomedical QA s3 | 0.9048 | 0.9286 | 0.8571 | 0.8929 | 0.1500 | 0.1500 | 0.1500 | 0.2299 | 0.2490 | 0.2366 |
| MedQA-3 | 0.8095 | 0.8571 | 0.7143 | 0.7857 | 0.2000 | 0.3000 | 0.2375 | 0.3002 | 0.3425 | 0.3082 |
| ubuntu | 0.9048 | 0.9333 | 0.8333 | 0.8833 | 0.1000 | 0.1500 | 0.1250 | 0.2429 | 0.2132 | 0.2168 |
| dmiip2024 | 0.9048 | 0.9286 | 0.8571 | 0.8929 | 0.3500 | 0.4000 | 0.3750 | 0.3231 | 0.2207 | 0.2371 |
| dmiip2024_1 | 0.8095 | 0.8571 | 0.7143 | 0.7857 | 0.3000 | 0.3000 | 0.3000 | 0.4141 | 0.3072 | 0.3408 |
| dmiip2024_2 | 0.8571 | 0.8889 | 0.8000 | 0.8444 | 0.3000 | 0.3500 | 0.3167 | 0.3538 | 0.3521 | 0.3276 |
| dmiip2024_3 | 0.8571 | 0.8889 | 0.8000 | 0.8444 | 0.3000 | 0.3000 | 0.3000 | 0.2974 | 0.1994 | 0.2294 |
| dmiip2024_4 | 0.9048 | 0.9333 | 0.8333 | 0.8833 | 0.3500 | 0.4000 | 0.3750 | 0.4359 | 0.3686 | 0.3909 |
| MedQA-4 | 0.8571 | 0.8966 | 0.7692 | 0.8329 | 0.3500 | 0.4500 | 0.4000 | 0.3160 | 0.3521 | 0.3219 |
| bioinfo-0 | 0.8571 | 0.8889 | 0.8000 | 0.8444 | 0.2000 | 0.2000 | 0.2000 | 0.2401 | 0.1608 | 0.1778 |
| bioinfo-3 | 0.9048 | 0.9286 | 0.8571 | 0.8929 | 0.2000 | 0.3500 | 0.2450 | 0.2926 | 0.3841 | 0.3145 |
| bioinfo-4 | 0.8571 | 0.8889 | 0.8000 | 0.8444 | 0.2500 | 0.3000 | 0.2750 | 0.2948 | 0.4034 | 0.3227 |
| bioinfo-1 | 0.9048 | 0.9286 | 0.8571 | 0.8929 | 0.2000 | 0.2500 | 0.2250 | 0.2391 | 0.2784 | 0.2479 |
| Fleming-1 | 0.9048 | 0.9333 | 0.8333 | 0.8833 | 0.1500 | 0.3000 | 0.1933 | 0.3367 | 0.4141 | 0.3544 |
| IR_J-1 | 0.9048 | 0.9333 | 0.8333 | 0.8833 | 0.2500 | 0.3000 | 0.2750 | 0.2314 | 0.1839 | 0.1822 |
| IR_J-2 | 0.8095 | 0.8462 | 0.7500 | 0.7981 | 0.2500 | 0.2500 | 0.2500 | 0.2277 | 0.1709 | 0.1776 |
| IR_J-3 | 0.8571 | 0.8800 | 0.8235 | 0.8518 | 0.1500 | 0.1500 | 0.1500 | 0.1722 | 0.1346 | 0.1233 |
| IR_J-4 | 0.8571 | 0.8800 | 0.8235 | 0.8518 | 0.1500 | 0.2500 | 0.1917 | 0.2379 | 0.2447 | 0.2294 |
| IR_J-5 | 0.7619 | 0.8148 | 0.6667 | 0.7407 | 0.2500 | 0.2500 | 0.2500 | 0.2224 | 0.1982 | 0.1782 |
| bioinfo-2 | 0.8571 | 0.8889 | 0.8000 | 0.8444 | 0.2000 | 0.2000 | 0.2000 | 0.2406 | 0.2249 | 0.2306 |
| qwen | 0.7143 | 0.8000 | 0.5000 | 0.6500 | 0.1500 | 0.2000 | 0.1625 | 0.1875 | 0.0754 | 0.1048 |
| config-2 | 0.8571 | 0.8889 | 0.8000 | 0.8444 | 0.2500 | 0.3000 | 0.2667 | 0.2350 | 0.3885 | 0.2785 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| ASK-Baseline | 0.2874 | 0.1026 | 0.3030 | 0.1129 | - | - | - | - |
| ASK-Tool1 | 0.2485 | 0.0972 | 0.2824 | 0.1095 | - | - | - | - |
| ASK-Tool2 | 0.2650 | 0.0888 | 0.2897 | 0.1030 | - | - | - | - |
| IR1 | 0.1852 | 0.1096 | 0.2011 | 0.1205 | - | - | - | - |
| Biomedical QA system | 0.2399 | 0.1059 | 0.2624 | 0.1161 | - | - | - | - |
| dictycite-baseline | 0.2106 | 0.1287 | 0.2278 | 0.1367 | - | - | - | - |
| dictycite-snippet | 0.2280 | 0.1332 | 0.2447 | 0.1415 | - | - | - | - |
| dictycite-max | 0.2249 | 0.1326 | 0.2375 | 0.1395 | - | - | - | - |
| dictycite-max-rew-si | 0.2437 | 0.1351 | 0.2539 | 0.1398 | - | - | - | - |
| dictycite-max-rew-sl | 0.2416 | 0.1337 | 0.2503 | 0.1401 | - | - | - | - |
| lean_rag | 0.2057 | 0.1537 | 0.2187 | 0.1577 | - | - | - | - |
| ASK-Skill2 | 0.2644 | 0.0968 | 0.2848 | 0.1084 | - | - | - | - |
| Biomedical QA s. v2 | 0.2351 | 0.1051 | 0.2591 | 0.1152 | - | - | - | - |
| asmalltrialsystem | 0.1617 | 0.0713 | 0.1752 | 0.0832 | - | - | - | - |
| lean_rag_ft | 0.2122 | 0.1564 | 0.2210 | 0.1565 | - | - | - | - |
| lean_rag_ft_sparse | 0.2147 | 0.1560 | 0.2253 | 0.1582 | - | - | - | - |
| lean_rag_dprf | 0.2136 | 0.1534 | 0.2235 | 0.1565 | - | - | - | - |
| ASK-Skill1 | 0.2617 | 0.0991 | 0.2856 | 0.1096 | - | - | - | - |
| "RMC_1" | 0.1807 | 0.1704 | 0.1852 | 0.1735 | - | - | - | - |
| RMC_2 | 0.1730 | 0.1822 | 0.1768 | 0.1851 | - | - | - | - |
| RMC_3 | 0.1814 | 0.1708 | 0.1860 | 0.1741 | - | - | - | - |
| RMC_4 | 0.2020 | 0.1681 | 0.2017 | 0.1658 | - | - | - | - |
| RMC_5 | 0.2001 | 0.1619 | 0.2112 | 0.1628 | - | - | - | - |
| Organization name | 0.1878 | 0.1091 | 0.1991 | 0.1160 | - | - | - | - |
| DS@GT-BioASQ | 0.1511 | 0.1583 | 0.1473 | 0.1540 | - | - | - | - |
| mckpt2 | 0.2101 | 0.1442 | 0.2220 | 0.1495 | - | - | - | - |
| UR-IW-1 | 0.2716 | 0.1311 | 0.2813 | 0.1370 | - | - | - | - |
| UR-IW-2 | 0.2292 | 0.1411 | 0.2433 | 0.1480 | - | - | - | - |
| UR-IW-3 | 0.1937 | 0.1265 | 0.2129 | 0.1319 | - | - | - | - |
| UR-IW-4 | 0.1947 | 0.1732 | 0.1985 | 0.1731 | - | - | - | - |
| UR-IW-5 | 0.1977 | 0.1534 | 0.2066 | 0.1583 | - | - | - | - |
| IR_Y-1 | 0.0895 | 0.0780 | 0.0904 | 0.0739 | - | - | - | - |
| IR_Y-3 | 0.1951 | 0.1216 | 0.2043 | 0.1272 | - | - | - | - |
| IR_Y-4 | 0.1944 | 0.1205 | 0.1993 | 0.1249 | - | - | - | - |
| IR_Y-5 | 0.1092 | 0.1352 | 0.1100 | 0.1349 | - | - | - | - |
| IR_Y-2 | 0.1117 | 0.0787 | 0.1158 | 0.0780 | - | - | - | - |
| MedQA-1 | 0.2275 | 0.1464 | 0.2408 | 0.1528 | - | - | - | - |
| MedQA-2 | 0.2387 | 0.1522 | 0.2422 | 0.1522 | - | - | - | - |
| mckpt1 | 0.1683 | 0.1372 | 0.1678 | 0.1328 | - | - | - | - |
| Biomedical QA s3 | 0.2407 | 0.1089 | 0.2523 | 0.1147 | - | - | - | - |
| MedQA-3 | 0.2115 | 0.1528 | 0.2112 | 0.1509 | - | - | - | - |
| ubuntu | 0.1208 | 0.0847 | 0.1212 | 0.0827 | - | - | - | - |
| dmiip2024 | 0.2114 | 0.1748 | 0.2118 | 0.1742 | - | - | - | - |
| dmiip2024_1 | 0.1973 | 0.1839 | 0.1978 | 0.1833 | - | - | - | - |
| dmiip2024_2 | 0.2176 | 0.1766 | 0.2191 | 0.1745 | - | - | - | - |
| dmiip2024_3 | 0.1814 | 0.1813 | 0.1790 | 0.1761 | - | - | - | - |
| dmiip2024_4 | 0.1746 | 0.1460 | 0.1757 | 0.1462 | - | - | - | - |
| MedQA-4 | 0.1887 | 0.1329 | 0.1929 | 0.1326 | - | - | - | - |
| bioinfo-0 | 0.1892 | 0.1322 | 0.1966 | 0.1334 | - | - | - | - |
| bioinfo-3 | 0.2843 | 0.0983 | 0.2984 | 0.1082 | - | - | - | - |
| bioinfo-4 | 0.2833 | 0.0963 | 0.2983 | 0.1061 | - | - | - | - |
| bioinfo-1 | 0.2280 | 0.1236 | 0.2406 | 0.1291 | - | - | - | - |
| Fleming-1 | 0.2526 | 0.0979 | 0.2796 | 0.1091 | - | - | - | - |
| IR_J-1 | - | - | - | - | - | - | - | - |
| IR_J-2 | 0.2095 | 0.1099 | 0.2152 | 0.1107 | - | - | - | - |
| IR_J-3 | 0.2072 | 0.1203 | 0.2120 | 0.1184 | - | - | - | - |
| IR_J-4 | - | - | - | - | - | - | - | - |
| IR_J-5 | 0.1997 | 0.1521 | 0.1985 | 0.1491 | - | - | - | - |
| bioinfo-2 | 0.1755 | 0.1286 | 0.1918 | 0.1353 | - | - | - | - |
| qwen | 0.1646 | 0.1178 | 0.1702 | 0.1170 | - | - | - | - |
| config-2 | 0.2989 | 0.1100 | 0.3119 | 0.1177 | - | - | - | - |
Test batch 3
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| IR1 | 0.7273 | 0.7692 | 0.6667 | 0.7179 | 0.2353 | 0.2353 | 0.2353 | 0.3529 | 0.1963 | 0.2290 |
| IR5 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.3529 | 0.3529 | 0.3529 | 0.3922 | 0.2094 | 0.2486 |
| asmalltrialsystem | 0.3636 | 0.2222 | 0.4615 | 0.3419 | - | - | - | 0.1166 | 0.3850 | 0.1659 |
| "RMC_1" | 0.8182 | 0.8571 | 0.7500 | 0.8036 | 0.0588 | 0.0588 | 0.0588 | 0.2353 | 0.2239 | 0.2179 |
| RMC_2 | 0.6364 | 0.6667 | 0.6000 | 0.6333 | 0.1765 | 0.1765 | 0.1765 | 0.1471 | 0.1176 | 0.1210 |
| RMC_3 | 0.8182 | 0.8571 | 0.7500 | 0.8036 | 0.0588 | 0.0588 | 0.0588 | 0.2353 | 0.2239 | 0.2179 |
| RMC_4 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.2353 | 0.2353 | 0.2353 | 0.1690 | 0.1701 | 0.1630 |
| RMC_5 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.3529 | 0.3529 | 0.3529 | 0.1500 | 0.1520 | 0.1422 |
| pancras_crag | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3529 | 0.3529 | 0.3529 | 0.1495 | 0.2382 | 0.1716 |
| pancras_naive | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2353 | 0.2941 | 0.2647 | 0.2787 | 0.3609 | 0.2981 |
| lean_rag | 0.6364 | 0.7500 | 0.3333 | 0.5417 | 0.2941 | 0.2941 | 0.2941 | 0.2725 | 0.2407 | 0.2358 |
| lean_rag_ft | 0.7273 | 0.8000 | 0.5714 | 0.6857 | 0.2353 | 0.2353 | 0.2353 | 0.2725 | 0.2407 | 0.2358 |
| lean_rag_dprf | 0.8182 | 0.8750 | 0.6667 | 0.7708 | 0.1765 | 0.1765 | 0.1765 | 0.2304 | 0.1819 | 0.1822 |
| lean_rag_ft_sparse | 0.7273 | 0.8000 | 0.5714 | 0.6857 | 0.2353 | 0.2353 | 0.2353 | 0.2725 | 0.2407 | 0.2358 |
| lean_rag_ensemble | 0.8182 | 0.8750 | 0.6667 | 0.7708 | 0.2353 | 0.2353 | 0.2353 | 0.2660 | 0.2480 | 0.2338 |
| Organization name | 0.1818 | 0.3077 | - | 0.1538 | 0.4118 | 0.4118 | 0.4118 | 0.0784 | 0.0490 | 0.0588 |
| multi-stage rank&llm | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3529 | 0.4706 | 0.4118 | 0.2914 | 0.3504 | 0.3000 |
| DS@GT-BioASQ | 0.7273 | 0.8421 | - | 0.4211 | 0.3529 | 0.3529 | 0.3529 | 0.2923 | 0.1787 | 0.2021 |
| dmiip2024 | 0.9091 | 0.9412 | 0.8000 | 0.8706 | 0.2941 | 0.4118 | 0.3431 | 0.2980 | 0.2255 | 0.2457 |
| dmiip2024_1 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.4706 | 0.5294 | 0.5000 | 0.3191 | 0.2958 | 0.2915 |
| dmiip2024_2 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.4706 | 0.5882 | 0.5294 | 0.2824 | 0.2482 | 0.2449 |
| dmiip2024_3 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4118 | 0.4706 | 0.4412 | 0.3255 | 0.2474 | 0.2684 |
| dmiip2024_4 | 0.9091 | 0.9412 | 0.8000 | 0.8706 | 0.4118 | 0.4706 | 0.4412 | 0.2784 | 0.2078 | 0.2268 |
| dictycite-baseline | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2353 | 0.3529 | 0.2941 | 0.3039 | 0.3225 | 0.2930 |
| dictycite-snippet | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2353 | 0.2941 | 0.2647 | 0.2829 | 0.2461 | 0.2570 |
| dictycite-max | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3529 | 0.4118 | 0.3824 | 0.2373 | 0.2011 | 0.2033 |
| dictycite-max-rew-si | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2353 | 0.3529 | 0.2941 | 0.2412 | 0.2188 | 0.2088 |
| dictycite-max-rew-sl | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1765 | 0.2353 | 0.2059 | 0.3051 | 0.3120 | 0.3022 |
| IR_Y-1 | 0.6364 | 0.7143 | 0.5000 | 0.6071 | 0.0000 | 0.2941 | 0.1294 | 0.0987 | 0.1463 | 0.1037 |
| UR-IW-4 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2353 | 0.2941 | 0.2647 | 0.3103 | 0.3833 | 0.3258 |
| UR-IW-5 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2941 | 0.4118 | 0.3431 | 0.2177 | 0.4233 | 0.2717 |
| UR-IW-2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3529 | 0.5294 | 0.4216 | 0.1801 | 0.3238 | 0.2128 |
| UR-IW-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3529 | 0.4118 | 0.3824 | 0.2980 | 0.3679 | 0.3079 |
| IR_Y-2 | 0.6364 | 0.7143 | 0.5000 | 0.6071 | 0.2353 | 0.3529 | 0.2941 | 0.1131 | 0.1396 | 0.1128 |
| UR-IW-3 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2353 | 0.2941 | 0.2647 | 0.3050 | 0.3833 | 0.3217 |
| IR_Y-3 | 0.6364 | 0.7143 | 0.5000 | 0.6071 | 0.2941 | 0.3529 | 0.3235 | 0.1041 | 0.1412 | 0.1093 |
| IR_Y-4 | 0.6364 | 0.7143 | 0.5000 | 0.6071 | 0.2941 | 0.3529 | 0.3235 | 0.1098 | 0.1375 | 0.1122 |
| IR_Y-5 | 0.6364 | 0.7143 | 0.5000 | 0.6071 | 0.2941 | 0.3529 | 0.3235 | 0.1039 | 0.1412 | 0.1102 |
| Fleming-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2941 | 0.3529 | 0.3137 | 0.3393 | 0.3256 | 0.3173 |
| ASK-Skill1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2941 | 0.4118 | 0.3529 | 0.1809 | 0.3757 | 0.2254 |
| ASK-Skill2 | 0.8182 | 0.8571 | 0.7500 | 0.8036 | 0.3529 | 0.4118 | 0.3676 | 0.1721 | 0.3603 | 0.2110 |
| ASK-Baseline | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.2941 | 0.4706 | 0.3725 | 0.1173 | 0.3695 | 0.1544 |
| ASK-Tool1 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.2941 | 0.3529 | 0.3235 | 0.1786 | 0.4056 | 0.2303 |
| IR_J-1 | 0.8182 | 0.8571 | 0.7500 | 0.8036 | 0.2353 | 0.3529 | 0.2941 | 0.2181 | 0.2544 | 0.2184 |
| IR_J-2 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.2941 | 0.3529 | 0.3235 | 0.2129 | 0.2630 | 0.2146 |
| IR_J-3 | 0.8182 | 0.8571 | 0.7500 | 0.8036 | 0.2353 | 0.2941 | 0.2647 | 0.2383 | 0.1975 | 0.1951 |
| IR_J-4 | 0.8182 | 0.8571 | 0.7500 | 0.8036 | 0.1765 | 0.2941 | 0.2353 | 0.2303 | 0.2368 | 0.2030 |
| IR_J-5 | 0.8182 | 0.8750 | 0.6667 | 0.7708 | 0.2941 | 0.3529 | 0.3235 | 0.2624 | 0.2299 | 0.2307 |
| Gen-Doc | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3529 | 0.4118 | 0.3824 | 0.2838 | 0.2813 | 0.2785 |
| bioinfo-0 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.4118 | 0.4118 | 0.4118 | 0.2835 | 0.2659 | 0.2671 |
| bioinfo-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3529 | 0.4706 | 0.4118 | 0.2787 | 0.3355 | 0.2891 |
| bioinfo-2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2941 | 0.4706 | 0.3824 | 0.2459 | 0.3221 | 0.2649 |
| bioinfo-3 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.4118 | 0.5294 | 0.4706 | 0.2863 | 0.3779 | 0.3138 |
| bioinfo-4 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.3529 | 0.5294 | 0.4412 | 0.2602 | 0.3140 | 0.2790 |
| MedQA-1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4118 | 0.5294 | 0.4608 | 0.2198 | 0.2794 | 0.2399 |
| IR2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1765 | 0.3529 | 0.2647 | 0.2730 | 0.2935 | 0.2604 |
| IR3 | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.1765 | 0.3529 | 0.2647 | 0.2743 | 0.3111 | 0.2724 |
| MedQA-3 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4118 | 0.5294 | 0.4706 | 0.2479 | 0.2932 | 0.2564 |
| MedQA-4 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4118 | 0.4706 | 0.4412 | 0.2256 | 0.3007 | 0.2476 |
| MedQA-2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4118 | 0.5294 | 0.4706 | 0.2275 | 0.2794 | 0.2390 |
| MedQA-5 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4118 | 0.5294 | 0.4608 | 0.2416 | 0.3069 | 0.2628 |
| BioXR-GTE | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2941 | 0.3529 | 0.3235 | 0.2630 | 0.3267 | 0.2773 |
| BioXR-BGE | 0.9091 | 0.9333 | 0.8571 | 0.8952 | 0.2941 | 0.4118 | 0.3529 | 0.2647 | 0.3074 | 0.2722 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| IR1 | 0.1275 | 0.0762 | 0.1541 | 0.0910 | - | - | - | - |
| IR5 | 0.1711 | 0.1237 | 0.1745 | 0.1277 | - | - | - | - |
| asmalltrialsystem | 0.1074 | 0.0643 | 0.1259 | 0.0771 | - | - | - | - |
| "RMC_1" | 0.1307 | 0.1317 | 0.1270 | 0.1288 | - | - | - | - |
| RMC_2 | 0.1293 | 0.1434 | 0.1217 | 0.1336 | - | - | - | - |
| RMC_3 | 0.1307 | 0.1310 | 0.1270 | 0.1281 | - | - | - | - |
| RMC_4 | 0.1349 | 0.1247 | 0.1359 | 0.1236 | - | - | - | - |
| RMC_5 | 0.1429 | 0.1338 | 0.1459 | 0.1310 | - | - | - | - |
| pancras_crag | 0.1619 | 0.0946 | 0.1843 | 0.1078 | - | - | - | - |
| pancras_naive | 0.1686 | 0.0978 | 0.1862 | 0.1104 | - | - | - | - |
| lean_rag | 0.1219 | 0.1035 | 0.1293 | 0.1088 | - | - | - | - |
| lean_rag_ft | 0.1374 | 0.1148 | 0.1468 | 0.1213 | - | - | - | - |
| lean_rag_dprf | 0.1327 | 0.1071 | 0.1452 | 0.1168 | - | - | - | - |
| lean_rag_ft_sparse | 0.1361 | 0.1141 | 0.1483 | 0.1229 | - | - | - | - |
| lean_rag_ensemble | 0.1373 | 0.1100 | 0.1463 | 0.1149 | - | - | - | - |
| Organization name | 0.1168 | 0.0569 | 0.1444 | 0.0775 | - | - | - | - |
| multi-stage rank&llm | 0.2372 | 0.1191 | 0.2445 | 0.1267 | - | - | - | - |
| DS@GT-BioASQ | 0.1006 | 0.1151 | 0.1051 | 0.1174 | - | - | - | - |
| dmiip2024 | 0.1712 | 0.1514 | 0.1650 | 0.1446 | - | - | - | - |
| dmiip2024_1 | 0.1374 | 0.1457 | 0.1370 | 0.1420 | - | - | - | - |
| dmiip2024_2 | 0.1499 | 0.1402 | 0.1502 | 0.1386 | - | - | - | - |
| dmiip2024_3 | 0.1423 | 0.1577 | 0.1404 | 0.1521 | - | - | - | - |
| dmiip2024_4 | 0.1490 | 0.1302 | 0.1463 | 0.1285 | - | - | - | - |
| dictycite-baseline | 0.1988 | 0.1227 | 0.2044 | 0.1269 | - | - | - | - |
| dictycite-snippet | 0.1782 | 0.1140 | 0.1926 | 0.1240 | - | - | - | - |
| dictycite-max | 0.1822 | 0.1141 | 0.1948 | 0.1236 | - | - | - | - |
| dictycite-max-rew-si | 0.1910 | 0.1191 | 0.2013 | 0.1273 | - | - | - | - |
| dictycite-max-rew-sl | 0.1947 | 0.1250 | 0.2019 | 0.1309 | - | - | - | - |
| IR_Y-1 | 0.0597 | 0.0462 | 0.0630 | 0.0480 | - | - | - | - |
| UR-IW-4 | 0.1363 | 0.1347 | 0.1403 | 0.1357 | - | - | - | - |
| UR-IW-5 | 0.2184 | 0.1100 | 0.2402 | 0.1253 | - | - | - | - |
| UR-IW-2 | 0.2255 | 0.1235 | 0.2430 | 0.1381 | - | - | - | - |
| UR-IW-1 | 0.1355 | 0.1118 | 0.1496 | 0.1220 | - | - | - | - |
| IR_Y-2 | 0.0861 | 0.0657 | 0.0894 | 0.0671 | - | - | - | - |
| UR-IW-3 | 0.1486 | 0.1517 | 0.1488 | 0.1496 | - | - | - | - |
| IR_Y-3 | 0.1352 | 0.1013 | 0.1375 | 0.1022 | - | - | - | - |
| IR_Y-4 | 0.1348 | 0.1035 | 0.1409 | 0.1064 | - | - | - | - |
| IR_Y-5 | 0.1416 | 0.1021 | 0.1453 | 0.1043 | - | - | - | - |
| Fleming-1 | 0.2221 | 0.0944 | 0.2419 | 0.1065 | - | - | - | - |
| ASK-Skill1 | 0.2024 | 0.0853 | 0.2286 | 0.0982 | - | - | - | - |
| ASK-Skill2 | 0.1986 | 0.0829 | 0.2231 | 0.0949 | - | - | - | - |
| ASK-Baseline | 0.2373 | 0.0995 | 0.2661 | 0.1150 | - | - | - | - |
| ASK-Tool1 | 0.2373 | 0.1022 | 0.2643 | 0.1153 | - | - | - | - |
| IR_J-1 | 0.1588 | 0.1314 | 0.1552 | 0.1289 | - | - | - | - |
| IR_J-2 | 0.1426 | 0.0981 | 0.1468 | 0.0985 | - | - | - | - |
| IR_J-3 | 0.1609 | 0.1038 | 0.1677 | 0.1048 | - | - | - | - |
| IR_J-4 | - | - | - | - | - | - | - | - |
| IR_J-5 | - | - | - | - | - | - | - | - |
| Gen-Doc | 0.0317 | 0.0209 | 0.0393 | 0.0254 | - | - | - | - |
| bioinfo-0 | 0.1710 | 0.1201 | 0.1815 | 0.1266 | - | - | - | - |
| bioinfo-1 | 0.1984 | 0.1166 | 0.2135 | 0.1243 | - | - | - | - |
| bioinfo-2 | 0.1933 | 0.1203 | 0.2005 | 0.1228 | - | - | - | - |
| bioinfo-3 | 0.2052 | 0.1210 | 0.2192 | 0.1286 | - | - | - | - |
| bioinfo-4 | 0.1942 | 0.1223 | 0.2059 | 0.1288 | - | - | - | - |
| MedQA-1 | 0.1809 | 0.1302 | 0.1930 | 0.1392 | - | - | - | - |
| IR2 | 0.1383 | 0.1265 | 0.1514 | 0.1373 | - | - | - | - |
| IR3 | 0.1645 | 0.1483 | 0.1704 | 0.1537 | - | - | - | - |
| MedQA-3 | 0.1763 | 0.1409 | 0.1823 | 0.1451 | - | - | - | - |
| MedQA-4 | 0.1755 | 0.1278 | 0.1851 | 0.1358 | - | - | - | - |
| MedQA-2 | 0.1847 | 0.1377 | 0.1892 | 0.1418 | - | - | - | - |
| MedQA-5 | 0.1937 | 0.1355 | 0.2047 | 0.1440 | - | - | - | - |
| BioXR-GTE | 0.0384 | 0.0259 | 0.0477 | 0.0318 | - | - | - | - |
| BioXR-BGE | 0.0376 | 0.0188 | 0.0455 | 0.0243 | - | - | - | - |
Test batch 4
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| UR-IW-1 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.1818 | 0.3636 | 0.2424 | 0.3862 | 0.6191 | 0.4463 |
| UR-IW-2 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.3636 | 0.3636 | 0.3636 | 0.2887 | 0.5948 | 0.3552 |
| UR-IW-3 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.1818 | 0.1818 | 0.1818 | 0.4198 | 0.5055 | 0.4236 |
| UR-IW-4 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.1818 | 0.1818 | 0.1818 | 0.4300 | 0.4910 | 0.4201 |
| UR-IW-5 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.3636 | 0.3636 | 0.3636 | 0.2533 | 0.6891 | 0.3513 |
| ASK-Baseline | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.2727 | 0.3636 | 0.3030 | 0.1378 | 0.5528 | 0.2040 |
| ASK-Tool2 | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.1818 | 0.1818 | 0.1818 | 0.2426 | 0.5322 | 0.3138 |
| SATO | 0.8125 | 0.8421 | 0.7692 | 0.8057 | 0.1818 | 0.2727 | 0.2273 | 0.3108 | 0.2282 | 0.2475 |
| ASK-Tool1 | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.0909 | 0.0909 | 0.0909 | 0.1750 | 0.4650 | 0.2397 |
| IR1 | 0.8750 | 0.8889 | 0.8571 | 0.8730 | 0.0909 | 0.0909 | 0.0909 | 0.3517 | 0.2489 | 0.2667 |
| IR5 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.2727 | 0.2727 | 0.3265 | 0.2156 | 0.2374 |
| IR4 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.2727 | 0.2727 | 0.3446 | 0.2544 | 0.2677 |
| multi-stage rank&llm | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.0909 | 0.2727 | 0.1439 | 0.2768 | 0.5427 | 0.3454 |
| FinalQwen | 0.7500 | 0.8182 | 0.6000 | 0.7091 | 0.1818 | 0.1818 | 0.1818 | 0.0500 | 0.0292 | 0.0367 |
| "RMC_1" | 0.6875 | 0.7619 | 0.5455 | 0.6537 | 0.2727 | 0.2727 | 0.2727 | 0.2875 | 0.2493 | 0.2620 |
| RMC_2 | 0.7500 | 0.8000 | 0.6667 | 0.7333 | 0.1818 | 0.1818 | 0.1818 | 0.1443 | 0.1339 | 0.1337 |
| RMC_3 | 0.7500 | 0.8182 | 0.6000 | 0.7091 | 0.2727 | 0.2727 | 0.2727 | 0.2375 | 0.2027 | 0.2138 |
| RMC_4 | 0.6875 | 0.7826 | 0.4444 | 0.6135 | 0.2727 | 0.2727 | 0.2727 | 0.2700 | 0.2610 | 0.2506 |
| RMC_5 | 0.7500 | 0.8182 | 0.6000 | 0.7091 | 0.0909 | 0.0909 | 0.0909 | 0.1967 | 0.1780 | 0.1723 |
| dictycite-baseline | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.1818 | 0.2727 | 0.2273 | 0.3760 | 0.3811 | 0.3608 |
| dictycite-snippet | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.2727 | 0.2727 | 0.4100 | 0.3765 | 0.3715 |
| dictycite-max-rew-si | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.1818 | 0.2727 | 0.2273 | 0.4630 | 0.4428 | 0.4308 |
| asmalltrialsystem | 0.7500 | 0.7778 | 0.7143 | 0.7460 | 0.2727 | 0.2727 | 0.2727 | 0.2175 | 0.3624 | 0.2549 |
| Finalcorrected | 0.7500 | 0.7778 | 0.7143 | 0.7460 | 0.2727 | 0.2727 | 0.2727 | 0.2175 | 0.3624 | 0.2549 |
| dictycite-max-rew-sl | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.3636 | 0.4545 | 0.4091 | 0.3241 | 0.4804 | 0.3625 |
| lean_rag | 0.7500 | 0.8182 | 0.6000 | 0.7091 | 0.0909 | 0.0909 | 0.0909 | 0.2952 | 0.2881 | 0.2703 |
| lean_rag_ft | 0.7500 | 0.8182 | 0.6000 | 0.7091 | 0.0909 | 0.0909 | 0.0909 | 0.3352 | 0.3314 | 0.3059 |
| lean_rag_dprf | 0.7500 | 0.8182 | 0.6000 | 0.7091 | 0.0909 | 0.1818 | 0.1364 | 0.3349 | 0.3214 | 0.3068 |
| lean_rag_ft_sparse | 0.7500 | 0.8182 | 0.6000 | 0.7091 | 0.0909 | 0.0909 | 0.0909 | 0.3518 | 0.3481 | 0.3226 |
| lean_rag_ensemble | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.0909 | 0.0909 | 0.0909 | 0.3073 | 0.3363 | 0.3065 |
| Organization name | 0.7500 | 0.7778 | 0.7143 | 0.7460 | - | - | - | 0.2288 | 0.1831 | 0.1750 |
| bioinfo-0 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.2727 | 0.2727 | 0.2317 | 0.2812 | 0.2316 |
| IR2 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.3636 | 0.3636 | 0.3636 | 0.3877 | 0.3752 | 0.3480 |
| IR3 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.3636 | 0.3636 | 0.3636 | 0.3622 | 0.3544 | 0.3347 |
| DS@GT-BioASQ | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.1818 | 0.2727 | 0.2273 | 0.3592 | 0.2249 | 0.2588 |
| DSGTBioasq | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.1818 | 0.2727 | 0.2273 | 0.3592 | 0.2249 | 0.2588 |
| llama for 14b b | 0.8125 | 0.8421 | 0.7692 | 0.8057 | - | - | - | 0.0100 | 0.0083 | 0.0091 |
| IR_J-1 | 0.8125 | 0.8571 | 0.7273 | 0.7922 | 0.1818 | 0.1818 | 0.1818 | 0.3552 | 0.3047 | 0.2950 |
| IR_Y-1 | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.0909 | 0.0909 | 0.0909 | 0.0367 | 0.0592 | 0.0448 |
| IR_J-2 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.2727 | 0.2727 | 0.2683 | 0.2784 | 0.2547 |
| IR_J-3 | 0.8125 | 0.8421 | 0.7692 | 0.8057 | 0.1818 | 0.1818 | 0.1818 | 0.4192 | 0.3368 | 0.3544 |
| IR_Y-2 | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.0909 | 0.0909 | 0.0909 | 0.0200 | 0.0384 | 0.0254 |
| IR_Y-3 | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.0909 | 0.1818 | 0.1364 | 0.0200 | 0.0488 | 0.0275 |
| IR_Y-5 | 0.8125 | 0.8421 | 0.7692 | 0.8057 | 0.0909 | 0.0909 | 0.0909 | 0.0283 | 0.0571 | 0.0363 |
| IR_Y-4 | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.2727 | 0.2727 | 0.2727 | 0.0150 | 0.0274 | 0.0172 |
| IR_J-4 | 0.8125 | 0.8571 | 0.7273 | 0.7922 | 0.1818 | 0.1818 | 0.1818 | 0.4108 | 0.3034 | 0.3247 |
| IR_J-5 | 0.8125 | 0.8421 | 0.7692 | 0.8057 | 0.0909 | 0.2727 | 0.1818 | 0.3615 | 0.3457 | 0.3322 |
| pancras_naive | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.1818 | 0.1818 | 0.1818 | 0.3240 | 0.4886 | 0.3633 |
| dmiip2024 | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.0909 | 0.0909 | 0.0909 | 0.3295 | 0.4433 | 0.3484 |
| dmiip2024_1 | 0.8125 | 0.8421 | 0.7692 | 0.8057 | 0.0909 | 0.0909 | 0.0909 | 0.4775 | 0.4657 | 0.4346 |
| dmiip2024_2 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.0909 | 0.1818 | 0.1091 | 0.3000 | 0.4623 | 0.3424 |
| dmiip2024_3 | 0.8125 | 0.8571 | 0.7273 | 0.7922 | 0.0909 | 0.1818 | 0.1364 | 0.5520 | 0.4826 | 0.4840 |
| dmiip2024_4 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.3636 | 0.3182 | 0.3706 | 0.3368 | 0.3386 |
| ASK-Skill1 | 0.8750 | 0.8889 | 0.8571 | 0.8730 | 0.2727 | 0.2727 | 0.2727 | 0.1028 | 0.2204 | 0.1266 |
| ASK-Skill2 | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.2727 | 0.2727 | 0.2727 | 0.1351 | 0.3376 | 0.1747 |
| Fleming-1 | 0.8125 | 0.8571 | 0.7273 | 0.7922 | 0.1818 | 0.1818 | 0.1818 | 0.3609 | 0.4229 | 0.3734 |
| pancras_crag | 0.9375 | 0.9474 | 0.9231 | 0.9352 | 0.2727 | 0.4545 | 0.3636 | 0.3689 | 0.4351 | 0.3704 |
| bioinfo-4 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.3636 | 0.3636 | 0.3636 | 0.2211 | 0.2124 | 0.2047 |
| bioinfo-1 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.2727 | 0.2727 | 0.3059 | 0.3630 | 0.3133 |
| bioinfo-3 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.2727 | 0.2727 | 0.3196 | 0.4053 | 0.3388 |
| MedQA-1 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.3636 | 0.3182 | 0.3762 | 0.4941 | 0.4034 |
| MedQA-2 | 0.8750 | 0.9000 | 0.8333 | 0.8667 | 0.2727 | 0.3636 | 0.3182 | 0.3784 | 0.5138 | 0.4149 |
| Gen-Doc | 0.8125 | 0.8571 | 0.7273 | 0.7922 | - | - | - | 0.3336 | 0.3380 | 0.3123 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| UR-IW-1 | 0.1736 | 0.1283 | 0.1852 | 0.1339 | - | - | - | - |
| UR-IW-2 | 0.2188 | 0.1100 | 0.2435 | 0.1242 | - | - | - | - |
| UR-IW-3 | 0.1527 | 0.1451 | 0.1623 | 0.1516 | - | - | - | - |
| UR-IW-4 | 0.1535 | 0.1423 | 0.1632 | 0.1479 | - | - | - | - |
| UR-IW-5 | 0.2377 | 0.1144 | 0.2614 | 0.1285 | - | - | - | - |
| ASK-Baseline | 0.2352 | 0.0910 | 0.2682 | 0.1048 | - | - | - | - |
| ASK-Tool2 | 0.2127 | 0.1018 | 0.2385 | 0.1137 | - | - | - | - |
| SATO | 0.1930 | 0.1249 | 0.2023 | 0.1308 | - | - | - | - |
| ASK-Tool1 | 0.2276 | 0.1028 | 0.2499 | 0.1151 | - | - | - | - |
| IR1 | 0.1746 | 0.1078 | 0.1879 | 0.1169 | - | - | - | - |
| IR5 | 0.2236 | 0.1515 | 0.2314 | 0.1566 | - | - | - | - |
| IR4 | 0.2106 | 0.1378 | 0.2169 | 0.1422 | - | - | - | - |
| multi-stage rank&llm | 0.2439 | 0.1227 | 0.2562 | 0.1319 | - | - | - | - |
| FinalQwen | 0.0880 | 0.0567 | 0.0932 | 0.0639 | - | - | - | - |
| "RMC_1" | 0.1606 | 0.1574 | 0.1536 | 0.1507 | - | - | - | - |
| RMC_2 | 0.1522 | 0.1643 | 0.1484 | 0.1616 | - | - | - | - |
| RMC_3 | 0.1595 | 0.1577 | 0.1562 | 0.1536 | - | - | - | - |
| RMC_4 | 0.1651 | 0.1384 | 0.1668 | 0.1386 | - | - | - | - |
| RMC_5 | 0.1760 | 0.1527 | 0.1790 | 0.1555 | - | - | - | - |
| dictycite-baseline | 0.2164 | 0.1381 | 0.2230 | 0.1418 | - | - | - | - |
| dictycite-snippet | 0.2143 | 0.1308 | 0.2222 | 0.1367 | - | - | - | - |
| dictycite-max-rew-si | 0.2175 | 0.1314 | 0.2275 | 0.1372 | - | - | - | - |
| asmalltrialsystem | 0.1252 | 0.0872 | 0.1329 | 0.0934 | - | - | - | - |
| Finalcorrected | 0.1252 | 0.0872 | 0.1329 | 0.0934 | - | - | - | - |
| dictycite-max-rew-sl | 0.2255 | 0.1562 | 0.2313 | 0.1573 | - | - | - | - |
| lean_rag | 0.2030 | 0.1561 | 0.2144 | 0.1628 | - | - | - | - |
| lean_rag_ft | 0.1891 | 0.1489 | 0.2014 | 0.1554 | - | - | - | - |
| lean_rag_dprf | 0.1865 | 0.1452 | 0.1966 | 0.1514 | - | - | - | - |
| lean_rag_ft_sparse | 0.1836 | 0.1425 | 0.1983 | 0.1486 | - | - | - | - |
| lean_rag_ensemble | 0.1880 | 0.1641 | 0.1959 | 0.1671 | - | - | - | - |
| Organization name | 0.1420 | 0.0860 | 0.1544 | 0.0953 | - | - | - | - |
| bioinfo-0 | 0.1992 | 0.1213 | 0.2119 | 0.1274 | - | - | - | - |
| IR2 | 0.2002 | 0.1714 | 0.1954 | 0.1655 | - | - | - | - |
| IR3 | 0.1939 | 0.1713 | 0.1942 | 0.1699 | - | - | - | - |
| DS@GT-BioASQ | 0.1092 | 0.1145 | 0.1098 | 0.1152 | - | - | - | - |
| DSGTBioasq | 0.1092 | 0.1145 | 0.1098 | 0.1152 | - | - | - | - |
| llama for 14b b | 0.0876 | 0.0525 | 0.0933 | 0.0543 | - | - | - | - |
| IR_J-1 | - | - | - | - | - | - | - | - |
| IR_Y-1 | 0.0500 | 0.0492 | 0.0533 | 0.0506 | - | - | - | - |
| IR_J-2 | 0.1519 | 0.1116 | 0.1553 | 0.1106 | - | - | - | - |
| IR_J-3 | 0.1292 | 0.0978 | 0.1327 | 0.0968 | - | - | - | - |
| IR_Y-2 | 0.0549 | 0.0576 | 0.0540 | 0.0536 | - | - | - | - |
| IR_Y-3 | 0.1884 | 0.1375 | 0.1951 | 0.1406 | - | - | - | - |
| IR_Y-5 | 0.1848 | 0.1386 | 0.1834 | 0.1379 | - | - | - | - |
| IR_Y-4 | 0.1786 | 0.1221 | 0.1908 | 0.1274 | - | - | - | - |
| IR_J-4 | 0.0104 | 0.0055 | 0.0094 | 0.0048 | - | - | - | - |
| IR_J-5 | 0.1652 | 0.1267 | 0.1755 | 0.1338 | - | - | - | - |
| pancras_naive | 0.1762 | 0.0948 | 0.1997 | 0.1084 | - | - | - | - |
| dmiip2024 | 0.1698 | 0.1408 | 0.1719 | 0.1401 | - | - | - | - |
| dmiip2024_1 | 0.1562 | 0.1652 | 0.1540 | 0.1614 | - | - | - | - |
| dmiip2024_2 | 0.1875 | 0.1571 | 0.1905 | 0.1573 | - | - | - | - |
| dmiip2024_3 | 0.1557 | 0.1653 | 0.1574 | 0.1671 | - | - | - | - |
| dmiip2024_4 | 0.1311 | 0.1171 | 0.1430 | 0.1270 | - | - | - | - |
| ASK-Skill1 | 0.2043 | 0.0815 | 0.2217 | 0.0930 | - | - | - | - |
| ASK-Skill2 | 0.2011 | 0.0818 | 0.2173 | 0.0918 | - | - | - | - |
| Fleming-1 | 0.2284 | 0.0944 | 0.2528 | 0.1082 | - | - | - | - |
| pancras_crag | 0.1899 | 0.1048 | 0.2061 | 0.1148 | - | - | - | - |
| bioinfo-4 | 0.2097 | 0.1151 | 0.2203 | 0.1180 | - | - | - | - |
| bioinfo-1 | 0.2122 | 0.1152 | 0.2230 | 0.1194 | - | - | - | - |
| bioinfo-3 | 0.2099 | 0.1220 | 0.2344 | 0.1323 | - | - | - | - |
| MedQA-1 | 0.1887 | 0.1370 | 0.1927 | 0.1416 | - | - | - | - |
| MedQA-2 | 0.1712 | 0.1312 | 0.1811 | 0.1389 | - | - | - | - |
| Gen-Doc | 0.0378 | 0.0276 | 0.0435 | 0.0305 | - | - | - | - |