Quantitative Impact of Label Noise on the Quality of Segmentation of Brain Tumors on MRI scans
Michał Marcinkiewicz, Grzegorz Mrukwa
Citation: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 18, pages 61–65 (2019)
Abstract. Over the last few years, deep learning has proven to be a great solution to many problems, such as image or text classification. Recently, deep learning-based solutions have outperformed humans on selected benchmark datasets, yielding a promising future for scientific and real-world applications. Training of deep learning models requires vast amounts of high quality data to achieve such supreme performance. In real-world scenarios, obtaining a large, coherent, and properly labeled dataset is a challenging task. This is especially true in medical applications, where high-quality data and annotations are scarce and the number of expert annotators is limited. In this paper, we investigate the impact of corrupted ground-truth masks on the performance of a neural network for a brain tumor segmentation task. Our findings suggest that a) the performance degrades about 8\\% less than it could be expected from simulations, b) a neural network learns the simulated biases of annotators, c) biases can be partially mitigated by using an inversely-biased dice loss function.
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, June 2009.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. Int J Comput Vis, 115: 211, 2015.
- L. Joseph, T. W. Gyorkos, and L. Coupal. Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. Am. J. Epidemiol., 141(3):263–272, Feb 1995.
- I. Bross. Misclassification in 2 x 2 tables. Biometrics, 10(4):478–486, 1954.
- A. A. Bankier, D. Levine, E. F. Halpern, and H. Y. Kressel. Consensus interpretation in imaging research: is there a better way? Radiology, 257(1):14–17, Oct 2010.
- W. R. Mower. Evaluating bias and variability in diagnostic test reports. Ann Emerg Med, 33(1):85–91, Jan 1999.
- J. G. Jarvik and R. A. Deyo. Moderate versus mediocre: the reliability of spine MR data interpretations. Radiology, 250(1):15–17, Jan 2009.
- J. A. Carrino, J. D. Lurie, A. N. Tosteson, T. D. Tosteson, E. J. Carragee, J. Kaiser, M. R. Grove, E. Blood, L. H. Pearson, J. N. Weinstein, and R. Herzog. Lumbar spine: reliability of MR imaging findings. Radiology, 250(1):161–170, Jan 2009.
- S. K. Warfield, K. H. Zou, and W. M. Wells. Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging, 23(7):903–921, Jul 2004.
- X. Zhu and X. Wu. Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review, 22(3):177–210, Nov 2004.
- B. H. Menze et al. The multimodal brain tumor image segmentation benchmark (BraTS). IEEE TMI, 34(10):1993–2024, Oct 2015.
- S. Bakas et al. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Scientific data, 4:1–13, 9 2017.
- S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. F., and C. Davatzikos. Segmenta- tion labels and radiomic features for the pre-operative scans of the TCGA-GBM collection, 2017. The Cancer Imaging Archive. https://doi.org/10.7937/K9/TCIA.2017.KLXWJJ1Q.
- S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. F., and C. Davatzikos. Segmenta- tion labels and radiomic features for the pre-operative scans of the TCGA-LGG collection, 2017. The Cancer Imaging Archive. https://doi.org/10.7937/K9/TCIA.2017.GJQ7R0EF.
- M. Marcinkiewicz, J. Nalepa, P. R. Lorenzo, W. Dudzik, and G. Mrukwa. Automatic brain tumor segmentation using a two-stage multi-modal fcnn. In Alessandro Crimi, Spyridon Bakas, Hugo J. Kuijf, Farahani Keyvan, Mauricio Reyes, and Theo van Walsum, editors, Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, chapter 2, pages 13–24. Springer International Publishing, 2019.
- O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing.
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.