BioASQ Participants Area
Task Synergy - version 2025: Test Results
Test round 1
Documents
| System | Mean precision | Recall | F-Measure | MAP | GMAP |
|---|---|---|---|---|---|
| Fleming-2 | 0.2238 | 0.3375 | 0.2121 | 0.2828 | 0.0076 |
| Fleming-1 | 0.2426 | 0.3939 | 0.2403 | 0.2172 | 0.0142 |
| dmiip2024 | 0.3275 | 0.4936 | 0.2899 | 0.4060 | 0.0436 |
| dmiip2024_1 | 0.3397 | 0.4944 | 0.2906 | 0.4051 | 0.0497 |
| dmiip2024_2 | 0.3217 | 0.4921 | 0.2848 | 0.3960 | 0.0423 |
| dmiip2024_3 | 0.2410 | 0.4860 | 0.2743 | 0.3969 | 0.0645 |
| dmiip2024_4 | 0.3051 | 0.4687 | 0.2703 | 0.3844 | 0.0309 |
| SCIRE1 Results | 0.3227 | 0.2072 | 0.2249 | 0.1928 | 0.0011 |
Snippets
| System | Mean precision | Recall | F-Measure | MAP | GMAP |
|---|---|---|---|---|---|
| Fleming-2 | 0.1509 | 0.1899 | 0.1388 | 0.2068 | 0.0072 |
| Fleming-1 | 0.1864 | 0.1290 | 0.1297 | 0.1164 | 0.0014 |
| dmiip2024 | 0.2688 | 0.5011 | 0.3003 | 0.5653 | 0.0625 |
| dmiip2024_1 | 0.2803 | 0.5049 | 0.3087 | 0.5880 | 0.0879 |
| dmiip2024_2 | 0.2552 | 0.4682 | 0.2855 | 0.5544 | 0.0590 |
| dmiip2024_3 | 0.2163 | 0.4364 | 0.2489 | 0.5043 | 0.0437 |
| dmiip2024_4 | 0.2471 | 0.4710 | 0.2746 | 0.5636 | 0.0522 |
| SCIRE1 Results | 0.1444 | 0.1095 | 0.1090 | 0.0749 | 0.0003 |
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| Fleming-2 | 0.4000 | 0.5714 | - | 0.2857 | - | - | - | 0.1429 | 0.0130 | 0.0238 |
| Fleming-1 | 0.4000 | 0.5714 | - | 0.2857 | - | - | - | 0.1429 | 0.0130 | 0.0238 |
| dmiip2024 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.6667 | 0.6667 | 0.6667 | 0.0238 | 0.0130 | 0.0168 |
| dmiip2024_1 | 0.4000 | 0.5714 | - | 0.2857 | - | - | - | - | - | - |
| dmiip2024_2 | 0.8000 | 0.8000 | 0.8000 | 0.8000 | 0.6667 | 0.6667 | 0.6667 | 0.0476 | 0.0476 | 0.0476 |
| dmiip2024_3 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.6667 | 0.6667 | 0.6667 | 0.1667 | 0.0606 | 0.0882 |
| dmiip2024_4 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.6667 | 0.6667 | 0.6667 | 0.0714 | 0.0476 | 0.0571 |
| SCIRE1 Results | 0.8000 | 0.8000 | 0.8000 | 0.8000 | 0.6667 | 0.6667 | 0.6667 | - | - | - |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| Fleming-2 | - | - | - | - | - | - | - | - |
| Fleming-1 | - | - | - | - | - | - | - | - |
| dmiip2024 | 0.1574 | 0.1982 | 0.1604 | 0.2038 | 4.47 | 4.53 | 4.47 | 4.47 |
| dmiip2024_1 | 0.1476 | 0.1880 | 0.1524 | 0.1953 | 4.42 | 4.42 | 4.47 | 4.58 |
| dmiip2024_2 | 0.1336 | 0.1719 | 0.1354 | 0.1760 | 4.37 | 4.37 | 4.16 | 4.58 |
| dmiip2024_3 | 0.1553 | 0.1831 | 0.1583 | 0.1890 | 4.47 | 4.32 | 4.00 | 4.53 |
| dmiip2024_4 | 0.1855 | 0.2211 | 0.1905 | 0.2282 | 4.37 | 4.53 | 4.16 | 4.58 |
| SCIRE1 Results | 0.1423 | 0.1756 | 0.1470 | 0.1810 | 4.16 | 4.32 | 4.16 | 4.53 |
Test round 2
Documents
| System | Mean precision | Recall | F-Measure | MAP | GMAP |
|---|---|---|---|---|---|
| dmiip2024 | 0.1860 | 0.5784 | 0.2549 | 0.3949 | 0.0426 |
| dmiip2024_1 | 0.2000 | 0.6372 | 0.2760 | 0.4125 | 0.0679 |
| dmiip2024_2 | 0.2000 | 0.6372 | 0.2760 | 0.4125 | 0.0679 |
| dmiip2024_3 | 0.1837 | 0.5549 | 0.2519 | 0.3322 | 0.0371 |
| dmiip2024_4 | 0.1930 | 0.5923 | 0.2671 | 0.3393 | 0.0474 |
| SCIRE2 Results | 0.0116 | 0.0116 | 0.0116 | 0.0058 | 0.0000 |
| Fleming-2 | 0.0736 | 0.0491 | 0.0505 | 0.0317 | 0.0000 |
| Q&A based on RAG | 0.0422 | 0.0615 | 0.0446 | 0.0361 | 0.0001 |
| Fleming-1 | 0.0736 | 0.0491 | 0.0505 | 0.0317 | 0.0000 |
| Q&A based on RAG2 | 0.0490 | 0.0629 | 0.0512 | 0.0328 | 0.0001 |
Snippets
| System | Mean precision | Recall | F-Measure | MAP | GMAP |
|---|---|---|---|---|---|
| dmiip2024 | 0.1132 | 0.2028 | 0.1202 | 0.2261 | 0.0016 |
| dmiip2024_1 | 0.1248 | 0.2805 | 0.1414 | 0.2478 | 0.0027 |
| dmiip2024_2 | 0.1248 | 0.2805 | 0.1414 | 0.2478 | 0.0027 |
| dmiip2024_3 | 0.1153 | 0.1923 | 0.1163 | 0.1947 | 0.0009 |
| dmiip2024_4 | 0.1187 | 0.2395 | 0.1282 | 0.1971 | 0.0015 |
| SCIRE2 Results | 0.3422 | 0.3072 | 0.2949 | 0.2740 | 0.0026 |
| Fleming-2 | 0.1211 | 0.0846 | 0.0757 | 0.1190 | 0.0004 |
| Q&A based on RAG | 0.0265 | 0.0225 | 0.0217 | 0.0203 | 0.0000 |
| Fleming-1 | 0.1211 | 0.0846 | 0.0757 | 0.1190 | 0.0004 |
| Q&A based on RAG2 | 0.0290 | 0.0207 | 0.0222 | 0.0180 | 0.0000 |
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| dmiip2024 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4286 | 0.4286 | 0.4286 | 0.1000 | 0.1000 | 0.1000 |
| dmiip2024_1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.4286 | 0.4286 | 0.4286 | 0.2000 | 0.4000 | 0.2467 |
| dmiip2024_2 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.2857 | 0.2857 | 0.2857 | 0.1700 | 0.3000 | 0.2000 |
| dmiip2024_3 | 0.8571 | 0.8000 | 0.8889 | 0.8444 | 0.2857 | 0.2857 | 0.2857 | 0.1000 | 0.1000 | 0.1000 |
| dmiip2024_4 | 0.8571 | 0.8000 | 0.8889 | 0.8444 | 0.2857 | 0.2857 | 0.2857 | 0.1833 | 0.2500 | 0.2000 |
| SCIRE2 Results | 0.8571 | 0.8571 | 0.8571 | 0.8571 | 0.2857 | 0.2857 | 0.2857 | 0.1500 | 0.1500 | 0.1500 |
| Fleming-2 | 0.5714 | 0.6667 | 0.4000 | 0.5333 | 0.1429 | 0.4286 | 0.2857 | 0.2167 | 0.2667 | 0.2333 |
| Q&A based on RAG | 0.5714 | 0.5714 | 0.5714 | 0.5714 | 0.1429 | 0.1429 | 0.1429 | 0.0500 | 0.0500 | 0.0500 |
| Fleming-1 | 0.8571 | 0.8571 | 0.8571 | 0.8571 | 0.2857 | 0.2857 | 0.2857 | 0.2111 | 0.2833 | 0.2100 |
| Q&A based on RAG2 | 0.5714 | 0.6667 | 0.4000 | 0.5333 | 0.1429 | 0.1429 | 0.1429 | 0.1000 | 0.1083 | 0.1036 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| dmiip2024 | 0.1305 | 0.1546 | 0.1375 | 0.1630 | - | - | - | - |
| dmiip2024_1 | 0.1630 | 0.1866 | 0.1668 | 0.1922 | 4.06 | 3.79 | 3.79 | 4.39 |
| dmiip2024_2 | 0.1312 | 0.1567 | 0.1374 | 0.1648 | - | - | - | - |
| dmiip2024_3 | 0.1281 | 0.1539 | 0.1324 | 0.1597 | 4.12 | 3.52 | 3.58 | 4.39 |
| dmiip2024_4 | 0.1500 | 0.1750 | 0.1536 | 0.1802 | 4.03 | 3.55 | 3.73 | 4.45 |
| SCIRE2 Results | 0.1674 | 0.1767 | 0.1770 | 0.1858 | 2.09 | 2.24 | 2.03 | 2.09 |
| Fleming-2 | 0.2120 | 0.1512 | 0.2341 | 0.1668 | 3.52 | 4.06 | 3.27 | 3.73 |
| Q&A based on RAG | 0.1536 | 0.1193 | 0.1754 | 0.1354 | 3.55 | 3.45 | 3.18 | 3.76 |
| Fleming-1 | 0.1752 | 0.1622 | 0.1896 | 0.1749 | 4.06 | 4.24 | 3.88 | 4.00 |
| Q&A based on RAG2 | 0.1497 | 0.1141 | 0.1733 | 0.1325 | 3.39 | 3.18 | 3.03 | 3.67 |
Test round 3
Documents
| System | Mean precision | Recall | F-Measure | MAP | GMAP |
|---|---|---|---|---|---|
| dmiip2024 | 0.1364 | 0.5078 | 0.1954 | 0.3017 | 0.0179 |
| dmiip2024_1 | 0.1455 | 0.5523 | 0.2095 | 0.3636 | 0.0272 |
| dmiip2024_3 | 0.1303 | 0.4438 | 0.1863 | 0.2657 | 0.0098 |
| dmiip2024_4 | 0.1333 | 0.4595 | 0.1916 | 0.2626 | 0.0104 |
| dmiip2024_2 | 0.1394 | 0.4676 | 0.1985 | 0.2821 | 0.0110 |
| Fleming-1 | - | - | - | - | - |
| SCIRE3 Results | 0.1638 | 0.2670 | 0.1498 | 0.1156 | 0.0009 |
| Fleming-2 | - | - | - | - | - |
| Q&A ClusteredRAG | - | - | - | - | - |
| Q&A - ClusteredRAG | - | - | - | - | - |
| Q&A based on RAG | - | - | - | - | - |
| Q&A based on RAG2 | - | - | - | - | - |
| Fleming-3 | - | - | - | - | - |
Snippets
| System | Mean precision | Recall | F-Measure | MAP | GMAP |
|---|---|---|---|---|---|
| dmiip2024 | 0.0826 | 0.2159 | 0.0986 | 0.1835 | 0.0009 |
| dmiip2024_1 | 0.0922 | 0.2372 | 0.1099 | 0.2504 | 0.0013 |
| dmiip2024_3 | 0.0781 | 0.1650 | 0.0893 | 0.1515 | 0.0005 |
| dmiip2024_4 | 0.0685 | 0.1456 | 0.0793 | 0.1336 | 0.0004 |
| dmiip2024_2 | 0.0809 | 0.1649 | 0.0911 | 0.1564 | 0.0006 |
| Fleming-1 | - | - | - | - | - |
| SCIRE3 Results | 0.1564 | 0.2072 | 0.1453 | 0.2576 | 0.0010 |
| Fleming-2 | - | - | - | - | - |
| Q&A ClusteredRAG | - | - | - | - | - |
| Q&A - ClusteredRAG | - | - | - | - | - |
| Q&A based on RAG | - | - | - | - | - |
| Q&A based on RAG2 | - | - | - | - | - |
| Fleming-3 | - | - | - | - | - |
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| dmiip2024 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.5000 | 0.5000 | 0.5000 | 0.1547 | 0.4115 | 0.1993 |
| dmiip2024_1 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.5000 | 0.5000 | 0.5000 | 0.1875 | 0.4583 | 0.2304 |
| dmiip2024_3 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.2000 | 0.3000 | 0.2500 | 0.1980 | 0.5265 | 0.2423 |
| dmiip2024_4 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.4000 | 0.6000 | 0.5000 | 0.1980 | 0.5890 | 0.2495 |
| dmiip2024_2 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.2000 | 0.3000 | 0.2500 | 0.2155 | 0.5473 | 0.2593 |
| Fleming-1 | 0.8000 | 0.8333 | 0.7500 | 0.7917 | 0.2000 | 0.3000 | 0.2500 | 0.2177 | 0.5104 | 0.2634 |
| SCIRE3 Results | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.3000 | 0.4000 | 0.3250 | 0.1751 | 0.4328 | 0.2080 |
| Fleming-2 | 0.8000 | 0.8333 | 0.7500 | 0.7917 | 0.2000 | 0.3000 | 0.2500 | 0.2177 | 0.5104 | 0.2634 |
| Q&A ClusteredRAG | 0.7000 | 0.5714 | 0.7692 | 0.6703 | 0.1000 | 0.1000 | 0.1000 | 0.0146 | 0.0682 | 0.0227 |
| Q&A - ClusteredRAG | 0.6000 | 0.3333 | 0.7143 | 0.5238 | - | - | - | 0.0246 | 0.0781 | 0.0313 |
| Q&A based on RAG | 0.7000 | 0.5714 | 0.7692 | 0.6703 | 0.1000 | 0.1000 | 0.1000 | 0.1198 | 0.1151 | 0.0751 |
| Q&A based on RAG2 | 0.7000 | 0.5714 | 0.7692 | 0.6703 | 0.1000 | 0.1000 | 0.1000 | 0.0547 | 0.1406 | 0.0712 |
| Fleming-3 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.2000 | 0.3000 | 0.2500 | 0.2177 | 0.5104 | 0.2634 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| dmiip2024 | 0.1670 | 0.1968 | 0.1700 | 0.2011 | 4.49 | 4.37 | 4.27 | 4.47 |
| dmiip2024_1 | 0.2438 | 0.2668 | 0.2485 | 0.2723 | 4.57 | 4.61 | 4.43 | 4.57 |
| dmiip2024_3 | 0.2019 | 0.2198 | 0.2053 | 0.2241 | 4.47 | 4.47 | 4.31 | 4.45 |
| dmiip2024_4 | 0.2019 | 0.2251 | 0.2066 | 0.2309 | 4.51 | 4.53 | 4.31 | 4.47 |
| dmiip2024_2 | 0.1995 | 0.2173 | 0.2056 | 0.2253 | 4.57 | 4.59 | 4.31 | 4.55 |
| Fleming-1 | 0.3064 | 0.2423 | 0.3178 | 0.2499 | 4.31 | 4.73 | 4.06 | 4.47 |
| SCIRE3 Results | 0.2076 | 0.1724 | 0.2192 | 0.1822 | 3.76 | 3.94 | 3.47 | 3.80 |
| Fleming-2 | 0.3064 | 0.2423 | 0.3178 | 0.2499 | 4.31 | 4.73 | 4.06 | 4.47 |
| Q&A ClusteredRAG | 0.0875 | 0.0665 | 0.1062 | 0.0806 | 3.61 | 3.00 | 2.84 | 3.84 |
| Q&A - ClusteredRAG | 0.1366 | 0.0942 | 0.1602 | 0.1094 | 3.43 | 2.92 | 2.69 | 3.67 |
| Q&A based on RAG | 0.1196 | 0.0828 | 0.1414 | 0.0985 | 3.78 | 3.14 | 2.92 | 3.92 |
| Q&A based on RAG2 | 0.1278 | 0.0961 | 0.1499 | 0.1109 | 3.55 | 2.94 | 2.73 | 3.84 |
| Fleming-3 | 0.3064 | 0.2423 | 0.3178 | 0.2499 | 4.31 | 4.73 | 4.06 | 4.47 |
Test round 4
Documents
| System | Mean precision | Recall | F-Measure | MAP | GMAP |
|---|---|---|---|---|---|
| SCIRE4 Results | 0.0192 | 0.0077 | 0.0110 | 0.0064 | 0.0000 |
| SCIRE4 Results (GPT) | 0.0632 | 0.0751 | 0.0554 | 0.0344 | 0.0000 |
| dmiip2024 | 0.1115 | 0.4287 | 0.1549 | 0.3008 | 0.0043 |
| dmiip2024_1 | 0.1192 | 0.4720 | 0.1662 | 0.3134 | 0.0067 |
| dmiip2024_2 | 0.1462 | 0.7319 | 0.2184 | 0.4716 | 0.0855 |
| dmiip2024_4 | 0.1462 | 0.7326 | 0.2187 | 0.4733 | 0.0855 |
| dmiip2024_3 | 0.1346 | 0.7376 | 0.2066 | 0.4446 | 0.1145 |
| Q&A based on RAG | 0.0353 | 0.0824 | 0.0423 | 0.0440 | 0.0000 |
| sinai_uja_RAG | 0.0417 | 0.1209 | 0.0533 | 0.0824 | 0.0001 |
| Q&A based on RAG2 | 0.0417 | 0.1209 | 0.0533 | 0.0824 | 0.0001 |
| Fleming-4 | - | - | - | - | - |
| Fleming-1 | - | - | - | - | - |
| Fleming-2 | - | - | - | - | - |
| Fleming-3 | - | - | - | - | - |
| retrieval+reranking | 0.1160 | 0.1885 | 0.1383 | 0.1474 | 0.0002 |
Snippets
| System | Mean precision | Recall | F-Measure | MAP | GMAP |
|---|---|---|---|---|---|
| SCIRE4 Results | 0.1490 | 0.1758 | 0.1284 | 0.2819 | 0.0017 |
| SCIRE4 Results (GPT) | 0.1623 | 0.1890 | 0.1467 | 0.2831 | 0.0024 |
| dmiip2024 | 0.0658 | 0.0518 | 0.0489 | 0.0858 | 0.0001 |
| dmiip2024_1 | 0.0623 | 0.0597 | 0.0506 | 0.1012 | 0.0002 |
| dmiip2024_2 | 0.0749 | 0.1038 | 0.0682 | 0.1299 | 0.0004 |
| dmiip2024_4 | 0.0756 | 0.1038 | 0.0684 | 0.1250 | 0.0004 |
| dmiip2024_3 | 0.0751 | 0.0990 | 0.0672 | 0.1138 | 0.0004 |
| Q&A based on RAG | 0.2458 | 0.3780 | 0.2518 | 0.3039 | 0.0143 |
| sinai_uja_RAG | 0.2451 | 0.3737 | 0.2505 | 0.3016 | 0.0142 |
| Q&A based on RAG2 | 0.2451 | 0.3737 | 0.2505 | 0.3016 | 0.0142 |
| Fleming-4 | - | - | - | - | - |
| Fleming-1 | - | - | - | - | - |
| Fleming-2 | - | - | - | - | - |
| Fleming-3 | - | - | - | - | - |
| retrieval+reranking | 0.1135 | 0.0722 | 0.0829 | 0.0978 | 0.0003 |
Exact Answers
| Yes/No | Factoid | List | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| System | Accuracy | F1 Yes | F1 No | Macro F1 | Strict Acc. | Lenient Acc. | MRR | Mean Prec. | Recall | F-Measure |
| SCIRE4 Results | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.2000 | 0.3000 | 0.2333 | 0.1727 | 0.3807 | 0.1988 |
| SCIRE4 Results (GPT) | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.3000 | 0.3000 | 0.3000 | 0.1564 | 0.4119 | 0.1906 |
| dmiip2024 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.2000 | 0.4000 | 0.3000 | 0.1995 | 0.4640 | 0.2421 |
| dmiip2024_1 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.2000 | 0.3000 | 0.2500 | 0.1571 | 0.4375 | 0.1957 |
| dmiip2024_2 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.4000 | 0.5000 | 0.4500 | 0.1942 | 0.4962 | 0.2369 |
| dmiip2024_4 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.2000 | 0.2000 | 0.2000 | 0.3125 | 0.2448 | 0.2625 |
| dmiip2024_3 | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.2000 | 0.3000 | 0.2500 | 0.1512 | 0.4119 | 0.1855 |
| Q&A based on RAG | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.3000 | 0.3000 | 0.3000 | 0.2156 | 0.5104 | 0.2624 |
| sinai_uja_RAG | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.4000 | 0.4000 | 0.4000 | 0.2331 | 0.4688 | 0.2667 |
| Q&A based on RAG2 | 0.8000 | 0.8333 | 0.7500 | 0.7917 | 0.3000 | 0.3000 | 0.3000 | 0.2200 | 0.4479 | 0.2472 |
| Fleming-4 | 0.8000 | 0.8333 | 0.7500 | 0.7917 | 0.3000 | 0.5000 | 0.4000 | 0.3250 | 0.4536 | 0.3536 |
| Fleming-1 | 0.8000 | 0.8333 | 0.7500 | 0.7917 | 0.3000 | 0.5000 | 0.4000 | 0.3250 | 0.4536 | 0.3536 |
| Fleming-2 | 0.8000 | 0.8333 | 0.7500 | 0.7917 | 0.3000 | 0.5000 | 0.4000 | 0.3250 | 0.4536 | 0.3536 |
| Fleming-3 | 0.8000 | 0.8333 | 0.7500 | 0.7917 | 0.3000 | 0.5000 | 0.4000 | 0.3250 | 0.4536 | 0.3536 |
| retrieval+reranking | 0.9000 | 0.9091 | 0.8889 | 0.8990 | 0.2000 | 0.2000 | 0.2000 | 0.0859 | 0.3125 | 0.1053 |
Ideal Answers
| Automatic scores (Rouge - R) | Manual scores | |||||||
|---|---|---|---|---|---|---|---|---|
| System | R-2 (Rec) | R-2 (F1) | R-SU4 (Rec) | R-SU4 (F1) | Readability | Recall | Precision | Repetition |
| SCIRE4 Results | 0.2032 | 0.1558 | 0.2160 | 0.1649 | 3.22 | 3.56 | 2.93 | 3.33 |
| SCIRE4 Results (GPT) | 0.2089 | 0.1494 | 0.2237 | 0.1594 | 3.93 | 4.38 | 3.42 | 3.98 |
| dmiip2024 | 0.2206 | 0.2406 | 0.2228 | 0.2437 | 4.25 | 4.36 | 4.02 | 4.27 |
| dmiip2024_1 | 0.2485 | 0.2709 | 0.2555 | 0.2792 | 4.47 | 4.53 | 4.24 | 4.49 |
| dmiip2024_2 | 0.2572 | 0.2811 | 0.2583 | 0.2828 | 4.40 | 4.44 | 4.09 | 4.38 |
| dmiip2024_4 | 0.1943 | 0.2128 | 0.2010 | 0.2198 | 4.25 | 4.24 | 3.91 | 4.25 |
| dmiip2024_3 | 0.1807 | 0.1971 | 0.1833 | 0.2002 | 4.25 | 4.31 | 3.91 | 4.25 |
| Q&A based on RAG | 0.2093 | 0.2047 | 0.2169 | 0.2126 | 4.15 | 4.22 | 3.91 | 4.16 |
| sinai_uja_RAG | 0.2005 | 0.2005 | 0.2008 | 0.2023 | 4.24 | 4.27 | 3.96 | 4.27 |
| Q&A based on RAG2 | 0.2008 | 0.1954 | 0.2035 | 0.1988 | 4.22 | 4.22 | 3.85 | 4.16 |
| Fleming-4 | 0.2654 | 0.2091 | 0.2780 | 0.2174 | 3.98 | 4.36 | 3.56 | 4.09 |
| Fleming-1 | 0.2638 | 0.2067 | 0.2760 | 0.2143 | 4.05 | 4.51 | 3.69 | 4.09 |
| Fleming-2 | 0.3038 | 0.1984 | 0.3090 | 0.2005 | 3.91 | 4.51 | 3.47 | 3.96 |
| Fleming-3 | 0.2638 | 0.2067 | 0.2760 | 0.2143 | 4.05 | 4.51 | 3.69 | 4.09 |
| retrieval+reranking | 0.1902 | 0.1367 | 0.2058 | 0.1464 | 3.71 | 4.07 | 3.29 | 3.73 |