Impact of clustering unlabeled data on classification: case study in bipolar disorder
Olga Kamińska, Katarzyna Kaczmarek-Majer, Olgierd Hryniewicz
Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 931–934 (2022)
Abstract. Currently, it is possible to collect large amount of data from sensors. At the same time, data are often only partially labeled. For example, in the context of smartphone-based monitoring of mental state, there are much more data collected from smartphones than those collected from psychiatrists about the mental state. The approach presented in this paper is designed to examine if unlabeled data can improve the accuracy of classification task in the considered case study of classifying a patient's state.First, unlabeled data are represented by clusters membership through Fuzzy C-means algorithm which corresponds to the uncertainty of the patient's condition in this disease. Secondly, the classification is perform using two well-known algorithms, Random Forest and SVM. The obtained results indicate a minimal improvement in the quality of classification thanks to the use of membership in clusters. These results are promising and also interpretable.
- A. Bouchachia and W. Pedrycz, “Data clustering with partial supervision,” Data Min. Knowl. Discov., vol. 12, no. 1, p. 47–78, jan 2006. [Online]. Available: https://doi.org/10.1007/s10618-005-0019-1
- A. Grünerbl, A. Muaremi, and V. Osmani, “Smartphone-based recognition of states and state changes in bipolar disorder patients,” IEEE Journal of Biomedical and Health Informatics, vol. 19(1), 2015.
- T. Chakraborty, “Ec3: Combining clustering and classification for ensemble learning,” in 2017 IEEE International Conference on Data Mining (ICDM), 2017, pp. 781–786.
- F. Eyben, F. Weninger, F. Gross, and B. Schuller, “Recent developments in opensmile, the munich open-source multimedia feature extractor,” in Proc. of the 21st ACM Int. Conf. on Multimedia, 2013, pp. 835–838.
- I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine learning, vol. 46, no. 1-3, pp. 389–422, 2002.
- O. Kamińska, K. Kaczmarek-Majer, and O. Hryniewicz, “Acoustic feature selection with fuzzy clustering, self organizing maps and psychiatric assessments,” Proceedings of Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2020, Lisbon, 2020.
- J. Bezdeck, R. Ehrlich, and W. Full, “Fcm: Fuzzy c-means algorithm,” 1984.
- M. Pal, “Random forest classifier for remote sensing classification,” Int. J. Remote Sens., pp. 217–222, 2005.
- K. Srinivasan, N. Mahendran, D. R. Vincent, C.-Y. Chang, and S. Syed-Abdul, “Realizing an integrated multistage support vector machine model for augmented recognition of unipolar depression,” Electronics, vol. 9, no. 4, p. 647, 2020.