BioASQ Participants Area

Task 5b: Test Results of Phase B

The test results are presented in seperate tables for each type of annotation. The "System Description" of each system is used.
The evaluation measures that are used in Task B are presented here .
Warning: For ideal answers, good ROUGE results do not always imply good manual scores.

Test batch 1

Exact Answers

	Yes/No	Factoid			List
System	Accuracy	Strict Acc.	Lenient Acc.	MRR	Mean precision	Recall	F-Measure
MQ-1	0.8824	-	-	-	-	-	-
MQ-2	0.8824	-	-	-	-	-	-
Deep QA (single)	-	0.5200	0.6000	0.5600	0.3269	0.4393	0.3365
Deep QA (ensemble)	-	0.5600	0.6800	0.6033	0.3106	0.4164	0.3341
auth-qa-1	0.8824	0.1200	0.1200	0.1200	-	-	-
sarrouti	0.7647	0.1200	0.2800	0.1833	0.1909	0.2539	0.2073
Olelo-GS	0.7647	0.0400	0.0400	0.0400	0.0402	0.1061	0.0477
Olelo	0.8235	0.0400	0.0400	0.0400	0.0193	0.0492	0.0240
HPI_SRL	0.8824	-	-	-	0.0045	0.0027	0.0034
limsi-reader-UMLS-r1	0.6471	0.0800	0.2000	0.1267	0.0523	0.0508	0.0500
MQ-3	0.8824	-	-	-	-	-	-
MQ-4	0.8824	-	-	-	-	-	-
Lab Zhu ,Fdan Univer	0.8824	0.4000	0.4400	0.4200	0.1489	0.4619	0.2068
SemanticRoleLabeling	0.8824	-	-	-	0.1119	0.1301	0.1158
Lab Zhu,Fudan Univer	0.8824	0.4000	0.4400	0.4200	0.1467	0.3782	0.1965
BioASQ_Baseline	0.3529	0.2800	0.4000	0.3333	0.2658	0.4715	0.3103

Ideal Answers

	Automatic scores		Manual scores
System	Rouge-2	Rouge-SU4	Readability	Recall	Precision	Repetition
MQ-1	0.5470	0.5599	3.54	4.47	3.82	3.68
MQ-2	0.5131	0.5221	3.42	4.51	3.82	3.57
Deep QA (single)	-	-	-	-	-	-
Deep QA (ensemble)	-	-	-	-	-	-
auth-qa-1	-	-	-	-	-	-
sarrouti	0.5087	0.5247	3.65	4.42	3.90	3.89
Olelo-GS	0.3081	0.3362	3.33	3.31	2.52	3.35
Olelo	0.2536	0.3004	2.50	3.10	1.89	2.57
HPI_SRL	0.0608	0.0653	0.81	0.81	0.82	0.87
limsi-reader-UMLS-r1	-	-	-	-	-	-
MQ-3	0.5194	0.5334	3.42	4.42	3.45	3.61
MQ-4	0.4136	0.4333	3.21	3.80	2.93	3.47
Lab Zhu ,Fdan Univer	0.3328	0.3401	4.16	4.18	4.23	4.67
SemanticRoleLabeling	0.0933	0.0974	1.03	1.04	1.02	1.10
Lab Zhu,Fudan Univer	0.3328	0.3401	4.16	4.18	4.23	4.67
BioASQ_Baseline	-	-	-	-	-	-

Test batch 2

Exact Answers

	Yes/No	Factoid			List
System	Accuracy	Strict Acc.	Lenient Acc.	MRR	Mean precision	Recall	F-Measure
auth-qa-1	0.9630	0.1290	0.1935	0.1559	0.1133	0.3989	0.1731
Olelo	0.7037	0.0323	0.0645	0.0430	0.0217	0.0522	0.0287
Olelo-GS	0.9630	0.0323	0.0323	0.0323	0.0222	0.0389	0.0281
Lab Zhu ,Fdan Univer	0.9630	0.3226	0.4194	0.3710	0.3828	0.6478	0.4597
sarrouti	0.7778	0.0323	0.1935	0.0887	0.2400	0.3922	0.2920
Lab Zhu,Fudan Univer	0.9630	0.4516	0.5161	0.4839	0.4178	0.6700	0.5001
LabZhu,FDU	0.9630	0.2903	0.4839	0.3570	0.2555	0.5144	0.3249
Deep QA (single)	-	0.3226	0.5161	0.4086	0.2571	0.3756	0.2895
Deep QA (ensemble)	-	0.3871	0.5161	0.4419	0.2530	0.2978	0.2617
MQ-1	0.9630	-	-	-	-	-	-
MQ-2	0.9630	-	-	-	-	-	-
MQ-3	0.9630	-	-	-	-	-	-
MQ-4	0.9630	-	-	-	-	-	-
Oaqa5b-tfidf	-	-	-	-	-	-	-
Oaqa5b	-	-	-	-	-	-	-
SemanticRoleLabeling	0.9630	0.0000	0.0645	0.0129	0.0952	0.1422	0.1123
LabZhu_FDU	0.9630	0.3226	0.4839	0.3839	0.3561	0.6344	0.4396
limsi-reader-UMLS-r1	0.5556	0.0968	0.1290	0.1075	0.0307	0.1133	0.0462
Oaqa 5b	-	-	-	-	-	-	-
LabZhu-FDU	0.9630	0.2258	0.3226	0.2688	0.2572	0.4789	0.3211
limsi-reader-UMLS-r2	0.5556	0.0323	0.1290	0.0656	0.0329	0.0933	0.0455
BioASQ_Baseline	0.3704	0.1613	0.3548	0.2215	0.2704	0.4433	0.2931

Ideal Answers

	Automatic scores		Manual scores
System	Rouge-2	Rouge-SU4	Readability	Recall	Precision	Repetition
auth-qa-1	0.2600	0.2736	3.50	3.29	3.16	4.11
Olelo	0.2807	0.3036	3.29	3.25	2.30	3.43
Olelo-GS	0.2151	0.2590	2.68	3.17	2.14	2.89
Lab Zhu ,Fdan Univer	0.3364	0.3372	4.03	4.28	4.24	4.58
sarrouti	0.4823	0.4828	3.68	4.59	4.01	3.91
Lab Zhu,Fudan Univer	0.3379	0.3393	4.04	4.31	4.28	4.57
LabZhu,FDU	0.3354	0.3349	4.04	4.26	4.22	4.58
Deep QA (single)	-	-	-	-	-	-
Deep QA (ensemble)	-	-	-	-	-	-
MQ-1	0.5117	0.5167	3.60	4.56	3.91	3.81
MQ-2	0.5351	0.5384	3.67	4.60	4.02	3.64
MQ-3	0.4859	0.4993	3.53	4.35	3.54	3.75
MQ-4	0.3895	0.4074	3.42	3.98	3.12	3.67
Oaqa5b-tfidf	0.1332	0.1352	1.00	1.20	1.09	1.04
Oaqa5b	0.1939	0.1928	0.95	1.30	1.04	0.96
SemanticRoleLabeling	0.0506	0.0499	0.69	0.71	0.71	0.77
LabZhu_FDU	0.3438	0.3453	4.02	4.28	4.24	4.58
limsi-reader-UMLS-r1	-	-	-	-	-	-
Oaqa 5b	0.1923	0.1924	0.94	1.29	1.12	0.94
LabZhu-FDU	0.3362	0.3356	4.05	4.26	4.23	4.58
limsi-reader-UMLS-r2	-	-	-	-	-	-
BioASQ_Baseline	-	-	-	-	-	-

Test batch 3

Exact Answers

	Yes/No	Factoid			List
System	Accuracy	Strict Acc.	Lenient Acc.	MRR	Mean precision	Recall	F-Measure
Olelo	0.6774	0.0000	0.0385	0.0192	0.0410	0.1044	0.0548
Olelo-GS	0.7742	0.0000	0.0385	0.0192	0.0281	0.1178	0.0423
auth-qa-1	0.8065	0.1538	0.3077	0.2147	0.1000	0.3435	0.1479
MQ-1	0.8065	-	-	-	-	-	-
MQ-2	0.8065	-	-	-	-	-	-
limsi-reader-UMLS-r2	0.5161	0.0385	0.0769	0.0481	0.0310	0.2222	0.0530
limsi-reader-UMLS-r1	0.5161	-	-	-	0.0067	0.0667	0.0121
Lab Zhu,Fudan Univer	0.8065	0.3462	0.4231	0.3846	0.3564	0.4800	0.3874
Lab Zhu ,Fdan Univer	0.8065	0.3077	0.3462	0.3269	0.3535	0.4800	0.3852
MQ-3	0.8065	-	-	-	-	-	-
MQ-4	0.8065	-	-	-	-	-	-
sarrouti	0.8387	0.1923	0.2692	0.2212	0.2000	0.4117	0.2625
SemanticRoleLabeling	0.8065	0.0000	0.0385	0.0128	0.1456	0.2524	0.1715
LabZhu,FDU	0.8065	0.2308	0.3077	0.2577	0.0944	0.2244	0.1247
Deep QA (single)	0.8065	0.3077	0.5769	0.4308	0.3643	0.5670	0.4128
Deep QA (ensemble)	0.8065	0.3077	0.6154	0.4212	0.4467	0.5981	0.4925
Oaqa5b-tfidf	0.5806	0.1154	0.2692	0.1923	0.1897	0.5743	0.2623
LabZhu_FDU	0.8065	0.3462	0.4231	0.3846	0.3580	0.4800	0.3884
LabZhu-FDU	0.8065	0.2308	0.3077	0.2577	0.0889	0.2244	0.1169
Oaqa-5b	0.5806	0.2692	0.3846	0.3173	0.1547	0.6371	0.2375
BioASQ_Baseline	0.5161	0.1154	0.2692	0.1923	0.1723	0.5632	0.2506

Ideal Answers

	Automatic scores		Manual scores
System	Rouge-2	Rouge-SU4	Readability	Recall	Precision	Repetition
Olelo	0.3364	0.3534	3.55	3.43	2.67	3.47
Olelo-GS	0.2879	0.3239	3.04	3.44	2.37	3.25
auth-qa-1	0.2788	0.2973	3.87	3.69	3.78	4.42
MQ-1	0.5771	0.5813	3.83	4.57	4.05	3.85
MQ-2	0.6062	0.6085	3.89	4.61	4.13	3.75
limsi-reader-UMLS-r2	-	-	-	-	-	-
limsi-reader-UMLS-r1	-	-	-	-	-	-
Lab Zhu,Fudan Univer	0.3279	0.3303	4.30	4.19	4.35	4.73
Lab Zhu ,Fdan Univer	0.3279	0.3299	4.30	4.19	4.35	4.73
MQ-3	0.5583	0.5767	3.84	4.50	3.70	3.78
MQ-4	0.5002	0.5158	3.57	4.11	3.37	3.73
sarrouti	0.5658	0.5729	3.91	4.64	4.07	4.00
SemanticRoleLabeling	0.0482	0.0484	0.36	0.35	0.34	0.36
LabZhu,FDU	0.3259	0.3261	4.30	4.17	4.36	4.76
Deep QA (single)	-	-	-	-	-	-
Deep QA (ensemble)	-	-	-	-	-	-
Oaqa5b-tfidf	0.2010	0.1993	1.06	1.34	1.16	1.03
LabZhu_FDU	0.3279	0.3303	4.30	4.19	4.35	4.73
LabZhu-FDU	0.3259	0.3261	4.30	4.17	4.36	4.76
Oaqa-5b	0.2005	0.1982	1.04	1.36	1.11	1.01
BioASQ_Baseline	-	-	-	-	-	-

Test batch 4

Exact Answers

	Yes/No	Factoid			List
System	Accuracy	Strict Acc.	Lenient Acc.	MRR	Mean precision	Recall	F-Measure
auth-qa-1	0.5517	0.1212	0.2424	0.1551	0.0564	0.1333	0.0718
Deep QA (single)	0.5517	0.2424	0.4242	0.3025	0.2254	0.3564	0.2419
Deep QA (ensemble)	0.5517	0.3333	0.5455	0.4162	0.2833	0.3436	0.2927
MQ-1	0.5517	-	-	-	-	-	-
MQ-2	0.5517	-	-	-	-	-	-
auth-qa-2	0.5517	0.0303	0.0606	0.0404	-	-	-
sarrouti	0.6207	0.0909	0.1212	0.0970	0.1077	0.2013	0.1369
Lab Zhu ,Fdan Univer	0.5517	0.1212	0.2424	0.1742	0.1143	0.3077	0.1599
auth-qa-3	0.5517	0.0303	0.0909	0.0465	-	-	-
Lab Zhu,Fudan Univer	0.5517	0.1818	0.3636	0.2601	0.3608	0.4231	0.3752
Olelo	0.5517	0.0000	0.0606	0.0253	0.0513	0.0513	0.0513
Olelo-GS	0.5172	-	-	-	0.0513	0.0513	0.0513
limsi-reader-UMLS-r1	0.5172	-	-	-	0.0192	0.0513	0.0280
Oaqa 5b	0.6207	0.0909	0.1212	0.1061	0.1165	0.4615	0.1792
Oaqa-5b	0.6552	0.1515	0.2424	0.1970	0.1252	0.5353	0.1909
SemanticRoleLabeling	0.5517	0.0303	0.0606	0.0379	0.0846	0.1122	0.0943
limsi-reader-UMLS-r2	0.5172	0.0303	0.0303	0.0303	0.0371	0.1667	0.0504
L2PS - DeepQA	0.5172	0.0000	0.0606	0.0212	0.0207	0.2423	0.0338
Oaqa5b-tfidf	0.6207	0.0909	0.1212	0.1061	0.1165	0.4615	0.1792
Oaqa5b	-	-	-	-	-	-	-
LabZhu,FDU	0.5517	0.2424	0.4242	0.3207	0.3608	0.4231	0.3752
MQ-4	0.5517	-	-	-	-	-	-
LabZhu_FDU	0.5517	0.2727	0.4545	0.3510	0.3608	0.4231	0.3752
MQ-3	0.5517	-	-	-	-	-	-
Basic QA pipline	0.5517	0.0909	0.2424	0.1414	0.0769	0.1462	0.0967
LabZhu-FDU	0.5517	0.1212	0.2121	0.1667	0.1239	0.3077	0.1692
BioASQ_Baseline	0.4828	0.0303	0.1212	0.0682	0.1624	0.4276	0.2180

Ideal Answers

	Automatic scores		Manual scores
System	Rouge-2	Rouge-SU4	Readability	Recall	Precision	Repetition
auth-qa-1	0.3473	0.3543	3.70	3.53	3.56	4.28
Deep QA (single)	-	-	-	-	-	-
Deep QA (ensemble)	-	-	-	-	-	-
MQ-1	0.5617	0.5533	3.94	4.51	4.03	3.90
MQ-2	0.5822	0.5742	3.82	4.55	4.07	3.69
auth-qa-2	-	-	-	-	-	-
sarrouti	0.5667	0.5640	3.86	4.51	4.02	3.95
Lab Zhu ,Fdan Univer	0.3564	0.3535	4.02	3.86	3.93	4.49
auth-qa-3	-	-	-	-	-	-
Lab Zhu,Fudan Univer	0.3693	0.3665	4.00	3.96	3.95	4.49
Olelo	0.2434	0.2851	2.71	2.92	2.04	2.96
Olelo-GS	0.3398	0.3462	3.43	3.24	2.59	3.46
limsi-reader-UMLS-r1	-	-	-	-	-	-
Oaqa 5b	0.5685	0.5619	2.96	4.17	3.44	3.63
Oaqa-5b	0.6704	0.6631	3.66	4.67	3.78	3.39
SemanticRoleLabeling	0.0804	0.0744	0.71	0.64	0.66	0.72
limsi-reader-UMLS-r2	-	-	-	-	-	-
L2PS - DeepQA	-	-	-	-	-	-
Oaqa5b-tfidf	0.6766	0.6692	3.61	4.57	3.68	3.29
Oaqa5b	0.6726	0.6642	3.64	4.68	3.70	3.30
LabZhu,FDU	0.3793	0.3765	4.03	4.03	4.01	4.55
MQ-4	0.4251	0.4367	3.44	3.83	2.94	3.70
LabZhu_FDU	0.3851	0.3822	4.03	4.03	4.02	4.55
MQ-3	0.5352	0.5313	3.69	4.28	3.65	3.79
Basic QA pipline	0.0803	0.0794	3.39	2.49	2.99	3.76
LabZhu-FDU	0.3564	0.3535	4.02	3.86	3.93	4.49
BioASQ_Baseline	-	-	-	-	-	-

Test batch 5

Exact Answers

	Yes/No	Factoid			List
System	Accuracy	Strict Acc.	Lenient Acc.	MRR	Mean precision	Recall	F-Measure
auth-qa-1	0.4615	0.1429	0.2286	0.1762	0.2000	0.4289	0.2418
Olelo	0.5000	-	-	-	0.0331	0.0826	0.0379
Olelo-GS	0.4615	-	-	-	0.0147	0.0583	0.0202
auth-qa-2	0.4615	-	-	-	-	-	-
Deep QA (single)	0.4615	0.3714	0.4571	0.3924	0.3308	0.4990	0.3606
Deep QA (ensemble)	0.4615	0.2571	0.5143	0.3510	0.3926	0.4854	0.3911
BioASQ_Baseline	0.6923	0.0571	0.2000	0.1167	0.2898	0.5585	0.3415
Lab Zhu ,Fdan Univer	0.4615	0.2571	0.3714	0.3010	0.2338	0.5289	0.2574
limsi-reader-UMLS-r2	0.5385	0.0286	0.0857	0.0452	0.0275	0.2178	0.0474
sarrouti	0.4615	0.1714	0.2857	0.2071	0.2182	0.3178	0.2529
MQ-1	0.4615	-	-	-	-	-	-
MQ-2	0.4615	-	-	-	-	-	-
MQ-3	0.4615	-	-	-	-	-	-
MQ-4	0.4615	-	-	-	-	-	-
Oaqa-5b	0.6154	0.1429	0.2286	0.1810	0.1280	0.6109	0.1970
Oaqa5b	0.6154	0.2000	0.3143	0.2381	0.2095	0.6433	0.2660
Oaqa 5b	0.6154	0.2000	0.3143	0.2381	0.2095	0.6433	0.2660
oaqa5b5	0.6154	0.2000	0.3143	0.2381	0.2095	0.6433	0.2660
Oaqa5b-tfidf	0.6154	0.1429	0.2286	0.1810	0.1280	0.6109	0.1970
SemanticRoleLabeling	0.4615	0.0000	0.0571	0.0286	0.2591	0.3962	0.3051
Lab Zhu,Fudan Univer	0.4615	0.4000	0.5143	0.4524	0.3170	0.5801	0.3811
Basic QA pipline	0.4615	0.1429	0.2571	0.1833	0.1818	0.2500	0.2062
LabZhu,FDU	0.4615	0.4000	0.5143	0.4524	0.3619	0.6003	0.4188
LabZhu_FDU	0.4615	-	-	-	0.0091	0.0114	0.0101
LabZhu-FDU	0.4615	0.1429	0.2000	0.1581	0.2146	0.2395	0.1783

Ideal Answers

	Automatic scores		Manual scores
System	Rouge-2	Rouge-SU4	Readability	Recall	Precision	Repetition
auth-qa-1	0.2629	0.2735	3.91	3.67	3.68	4.54
Olelo	0.3417	0.3518	3.48	3.09	2.45	3.49
Olelo-GS	0.2058	0.2602	2.95	3.14	2.07	3.05
auth-qa-2	-	-	-	-	-	-
Deep QA (single)	-	-	-	-	-	-
Deep QA (ensemble)	-	-	-	-	-	-
BioASQ_Baseline	-	-	-	-	-	-
Lab Zhu ,Fdan Univer	0.3236	0.3232	4.01	3.99	3.96	4.47
limsi-reader-UMLS-r2	-	-	-	-	-	-
sarrouti	0.5616	0.5595	3.82	4.53	3.91	3.90
MQ-1	0.5184	0.5189	3.80	4.40	3.89	3.85
MQ-2	0.5802	0.5703	3.85	4.44	3.98	3.74
MQ-3	0.4972	0.4933	3.75	4.21	3.61	3.70
MQ-4	0.4010	0.4115	3.62	3.78	3.24	3.64
Oaqa-5b	0.6885	0.6803	3.46	4.63	3.55	3.14
Oaqa5b	0.7064	0.6962	3.51	4.63	3.56	3.16
Oaqa 5b	0.6771	0.6672	3.53	4.64	3.55	3.17
oaqa5b5	0.6771	0.6672	3.53	4.64	3.55	3.17
Oaqa5b-tfidf	0.5773	0.5747	3.06	4.31	3.46	3.52
SemanticRoleLabeling	0.0405	0.0403	0.63	0.55	0.58	0.68
Lab Zhu,Fudan Univer	0.3369	0.3358	4.00	4.03	3.96	4.45
Basic QA pipline	0.0697	0.0705	2.93	2.30	2.65	3.71
LabZhu,FDU	0.3369	0.3358	4.00	4.03	3.96	4.45
LabZhu_FDU	0.3236	0.3232	4.00	3.99	3.98	4.48
LabZhu-FDU	0.3236	0.3232	4.02	3.99	3.98	4.48