Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 21

Proceedings of the 2020 Federated Conference on Computer Science and Information Systems

Automatic Generation of Annotated Corpora of Diagnoses with ICD-10 codes based on Open Data and Linked Open Data

DOI: http://dx.doi.org/10.15439/2020F192

Citation: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 21, pages 163167 ()

Abstract. We propose methods for automatic generation of corpora that contains descriptions of diagnoses in Bulgarian and their associated codes in ICD-10-CM (International Classification of Diseases, 10th revision, Clinical Modification). The proposed approach is based on the available open data and Linked Open Data and can be easily adapted for other languages. The resulted corpora generated for the Bulgarian clinical texts consists of about 370,000 pairs of diagnoses and corresponding ICD-10 codes and is beyond the usual size that can be generated manually, moreover it was created from scratch and for a relatively short time. Further updates of the corpora are also possible whenever new open resources are available or the current ones are updated.


