BioASQ Participants Area
BioASQ - Task DISTEMIST
For all information, we refer to the DISTEMIST webpage: https://temu.bsc.es/distemist
Send your doubts to antoniomiresc[at]gmail.com
What's new in BioASQ 2022, DISTEMIST
- This year the DISTEMIST task focuses on clinical case reports in Spanish.
- Since DISTEMIST has clinical documents, we focus on a clinical terminology, Snomed-CT, and on a particular entity type of clinical relevance, diseases.
- DISTEMIST task proposes 2 subtasks: (1) disease recognition in clinical documents and (2) disease linking to Snomed-CT.
Guidelines for Task DISTEMIST
The Task DISTEMIST will be structured into two independent sub-tasks, each taking into account a particularly important use case scenario.- DISTEMIST-entities subtrack: requires automatically finding disease mentions in published clinical cases.
- DISTEMIST-linking subtrack: requires automatically finding disease mentions in published clinical cases and assigning, to each mention, a Snomed-CT term.


The rest of the guidelines provide the essential information for participating in the Task DISTEMIST of the BioASQ challenge. They are organized in sections, by clicking on the titles you can find the relevant details.
For more information, we refer to the DISTEMIST webpage: https://temu.bsc.es/distemist.
+ Download Training Dataset
DISTEMIST datasets are available at Zenodo [https://doi.org/10.5281/zenodo.6408476].
For more information, we refer to the DISTEMIST webpage: https://temu.bsc.es/distemist/datasets/ .
For more information, we refer to the DISTEMIST webpage: https://temu.bsc.es/distemist/datasets/ .
+ Training Dataset Format
DISTEMIST-entities. Annotations are stored tab-separated file with headers and 6 columns:
For more information, we refer to the DISTEMIST webpage: https://temu.bsc.es/distemist/description-of-the-corpus/
- filename: document name
- mark: identifier mention id
- label: mentions type (ENFERMEDAD)
- off0: starting position of the mention in the document
- off1: ending position of the mention in the document
- span: text span
- filename: document name
- mark: identifier mention id
- label: mentions type (ENFERMEDAD)
- off0: starting position of the mention in the document
- off1: ending position of the mention in the document
- span: text span
- codes: List of Snomed-CT concept codes linked to the mention. If there is more than one code associated with a mention, they will be concatenated by the symbol “+”.
- semantic relation: the relationship between the assigned code and the mention. It can be EXACT, when the code corresponds exactly with the mention, or NARROW, when the mention corresponds to a narrower concept than the Snomed-CT code. For instance, the concept “Chorioretinal lacunae” does not exist in Snomed-CT. Then, it is normalized to the Snomed-CT ID 302893000 (“Chorioretinal disorder”).
DISTEMIST-entities training set | DISTEMIST-linking training set 1 | DISTEMIST-linking training set 2 | |
Number of clinical case reports | 750 | 209 | 375 |
Avg. mention/case report | 10.75 | 7.23 | 9.67 |
Avg. unique Snomed-CT codes/case report | NA | 6.12 | 8.30 |
For more information, we refer to the DISTEMIST webpage: https://temu.bsc.es/distemist/description-of-the-corpus/
+ Test Datasets
The test dataset consists of non-annotated clinical case reports.
In the DISTEMIST-entities subtrack, participants have to retrieve the disease mentions appearing in the non-annotated clinical case reports.
In the DISTEMIST-linking subtrack, participants have to retrieve the disease mentions appearing in the non-annotated clinical case reports together with their Snomed-CT codes.
In the DISTEMIST-entities subtrack, participants have to retrieve the disease mentions appearing in the non-annotated clinical case reports.
In the DISTEMIST-linking subtrack, participants have to retrieve the disease mentions appearing in the non-annotated clinical case reports together with their Snomed-CT codes.
+ Submit Test Results
In the section “Submitting/Task DisTEMIST” (http://participants-area.bioasq.org/Tasks/distemist/), you will find the “Submit your results”. Select your ZIP file with all your predictions (for both subtasks) and submit it there.
Important: the submission format details are here: https://temu.bsc.es/distemist/submission/
Important: the submission format details are here: https://temu.bsc.es/distemist/submission/
+ Evaluation
Evaluation will be done by comparing the automatically generated results to the results generated by manual annotation of experts. The primary evaluation metric for the sub-track 1 (DisTEMIST - entities) and sub-track 2 (DISTEMIST - linking) will consist of micro-averaged precision, recall and F1-scores.
The used evaluation scripts together with a README file with instructions are available on GitHub: https://github.com/TeMU-BSC/distemist_evaluation_library
The used evaluation scripts together with a README file with instructions are available on GitHub: https://github.com/TeMU-BSC/distemist_evaluation_library
+ Resources
DISTEMIST gazetteer
It contains main terms and synonyms from the relevant branches of Snomed-CT for the grounding of disease mentions. Relevant for NER and Entity linking. The Snomed-CT codes not included in this gazetteer will not be used for the official evaluation metrics. Find it on Zenodo (https://doi.org/10.5281/zenodo.6458114)Word embeddings
Spanish Medical Word Embeddings. Word embeddings generated from Spanish medical corpora. Download them from Zenodo (https://doi.org/10.5281/zenodo.2542721). It can be used as a building block for clinical NLP systems used in Spanish texts.Find more resources on the DISTEMIST webpage: https://temu.bsc.es/distemist/resources/