Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 18

Proceedings of the 2019 Federated Conference on Computer Science and Information Systems

Medical prescription classification: a NLP-based approach

, , ,

DOI: http://dx.doi.org/10.15439/2019F197

Citation: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 18, pages 605609 ()

Full text

Abstract. The digitization of healthcare data has been consolidated in the last decade as a must to manage the vast amount of data generated by healthcare organizations. Carrying out this process effectively represents an enabling resource that will improve healthcare services provision, as well as on-the-edge related applications, ranging from clinical text mining to predictive modelling, survival analysis, patient similarity, genetic data analysis and many others. The application presented in this work concerns the digitization of medical prescriptions, both to provide authorization for healthcare services or to grant reimbursement for medical expenses. The proposed system first extract text from scanned medical prescription, then Natural Language Processing and machine learning techniques provide effective classification exploiting embedded terms and categories about patient/- doctor personal data, symptoms, pathology, diagnosis and suggested treatments. A REST ful Web Service is introduced, together with results of prescription classification over a set of 800K+ of diagnostic statements.


  1. V. Carchiolo, A. Longheu, M. Malgeri, and G. Mangioni, “Multi-source agent-based healthcare data gathering,” in 2015 Federated Conference on Computer Science and Information Systems (FedCSIS), Sep. 2015, pp. 1723–1729.
  2. Y. Si and K. Roberts, “A frame-based nlp system for cancer-related information extraction.” AMIA Annu Symp Proc, vol. 2018, pp. 1524–1533, 2018.
  3. V. Carchiolo, A. Longheu, and M. Malgeri, “Using twitter data and sentiment analysis to study diseases dynamics,” in Proceedings of the 6th International Conference on Information Technology in Bio- and Medical Informatics - Volume 9267, ser. ITBAM 2015. New York, NY, USA: Springer-Verlag New York, Inc., 2015, pp. 16–24. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-22741-2_2
  4. S. Doan, E. W. Yang, S. Tilak, and M. Torii, “Using natural language processing to extract health-related causality from twitter messages,” in 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W), June 2018, pp. 84–85.
  5. S. A. Parah, J. A. Sheikh, F. Ahad, N. A. Loan, and G. M. Bhat, “Information hiding in medical images: a robust medical image watermarking system for e-healthcare,” Multimedia Tools and Applications, vol. 76, no. 8, pp. 10 599–10 633, Apr 2017. [Online]. Available: https://doi.org/10.1007/s11042-015-3127-y
  6. B. Shickel, P. J. Tighe, A. Bihorac, and P. Rashidi, “Deep ehr: A survey of recent advances in deep learning techniques for electronic health record (ehr) analysis,” IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 5, pp. 1589–1604, Sep. 2018.
  7. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical Image Analysis, vol. 42, pp. 60 – 88, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1361841517301135
  8. S. B. Kotsiantis, “Supervised machine learning: A review of classification techniques,” in Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies. Amsterdam, The Netherlands, The Netherlands: IOS Press, 2007, pp. 3–24. [Online]. Available: http://dl.acm.org/citation.cfm?id=1566770.1566773
  9. S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell, “Understanding data augmentation for classification: When to warp?” in 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Nov 2016, pp. 1–6.
  10. W. Bieniecki, S. Grabowski, and W. Rozenberg, “Image preprocessing for improving ocr accuracy,” 06 2007, pp. 75 – 80.
  11. X. Guan, S. Jian, P. Hongda, Z. Zhiguo, and G. Haibin, “An image enhancement method based on gamma correction,” in 2009 Second International Symposium on Computational Intelligence and Design, vol. 1, Dec 2009, pp. 60–63.
  12. X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “EAST: an efficient and accurate scene text detector,” CoRR, vol. abs/1704.03155, 2017. [Online]. Available: http://arxiv.org/abs/1704.03155
  13. Tesseract, “Tesseract Open Source OCR Engine,” https://github.com/tesseract-ocr/tesseract, last accessed 08 May 2019.
  14. F. J. Damerau, “A technique for computer detection and correction of spelling errors,” Commun. ACM, vol. 7, no. 3, pp. 171–176, Mar. 1964. [Online]. Available: http://doi.acm.org/10.1145/363958.363994
  15. E. Brill, “A simple rule-based part of speech tagger,” in Proceedings of the Third Conference on Applied Natural Language Processing, ser. ANLC ’92. Stroudsburg, PA, USA: Association for Computational Linguistics, 1992, pp. 152–155. [Online]. Available: https://doi.org/10.3115/974499.974526
  16. E. Brill and M. Pop, Unsupervised Learning of Disambiguation Rules for Part-of-Speech Tagging. Dordrecht: Springer Netherlands, 1999, pp. 27–42. [Online]. Available: https://doi.org/10.1007/978-94-017-2390-9_3