Annals of Computer Science and Information Systems, Volume 11

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems

Preprocessing compensation techniques for improved classification of imbalanced medical datasets


DOI: http://dx.doi.org/10.15439/2017F82

Citation: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 11, pages 203211 ()

Full text

Abstract. The paper describes the study on the problem of applying classification techniques in medical datasets with class imbalance. The aim is to identify factors that negatively affect classification results and propose actions that may be taken to improve the performance. To alleviate the impact of uneven and complex class distribution, methods of balancing the datasets are proposed and compared. The experiments were conducted on five datasets - three binary and two multiclass. They comprise several data preprocessing methods applied on data and classification with different techniques. The study shows that for some datasets there exists a combination of certain preprocessing method and classification technique which outperforms other approaches. For datasets with complex distribution or too many features the ratio of correctly predicted labels may be low regardless what resampling method and classification technique has been applied.


