Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 2

Proceedings of the 2014 Federated Conference on Computer Science and Information Systems

Data Cleansing of the Fire & Rescue Text Corpus. The Case Study of Correction of the Misspellings and Segmentation into Sentences.

Karol Kreński, Mateusz Fliszkiewicz

DOI: http://dx.doi.org/10.15439/2014F406

Citation: Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 2, pages 331–335 (2014)

Full text

Abstract. The article presents a case study of applying data cleansing methods and segmentation procedures in order to correct and enhance the structure of the domain corpus of fire service. During the study we present our approach and the results in the task of correcting the misspellings, as well as the method of segmenting the corpus into sentences.