Urban scene semantic segmentation using the U-Net model

Marcin Ciecholewski

Urban scene semantic segmentation using the U-Net model

Marcin Ciecholewski

DOI: http://dx.doi.org/10.15439/2023F3686

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 907–912 (2023)

Full text

Abstract. Vision-based semantic segmentation of complex urban street scenes is a very important function during autonomous driving (AD), which will become an important technology in industrialized countries in the near future. Today, advanced driver assistance systems (ADAS) improve traffic safety thanks to the application of solutions that enable detecting objects, recognising road signs, segmenting the road, etc. The basis for these functionalities is the adoption of various classifiers. This publication presents solutions utilising convolutional neural networks, such as MobileNet and ResNet50, which were used as encoders in the U-Net model to semantically segment images of complex urban scenes taken from the publicly available Cityscapes dataset. Some modifications of the encoder/decoder architecture of the U-Net model were also proposed and the result was named the MU-Net. During tests carried out on 500 images, the MU-Net model produced slightly better segmentation results than the universal MobileNet and ResNet networks, as measured by the Jaccard index, which amounted to 88.85\%. The experiments showed that the MobileNet network had the best ratio of accuracy to the number of parameters used and at the same time was the least sensitive to unusual phenomena occurring in images.

References

J. Long, E. Shelhamer and T. Darrell, “Fully convolutional networks for semantic segmentation,” In Proceedings of the IEEE conference on computer vision and pattern recognition 2015, pp. 3431-3440, https://doi.org/10.1109/CVPR.2015.7298965.
L.C. Chen, Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei and W. Liu, “Encoder-decoder with atrous separable convolution for semantic image segmentation," In Proceedings of the European conference on computer vision (ECCV) 2018, pp. 801-818, https://doi.org/10.1007/978-3-030-01234-2_49.
J. Fu, J. Liu, J. Jiang, Y. Li, Y. Bao and H. Lu, “Scene segmentation with dual relation-aware attention network," IEEE Transactions on Neural Networks and Learning Systems, vol. 32(6), 2020, pp. 2547-2560, https://doi.org/10.1109/TNNLS.2020.3006524.
S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz and D. Terzopoulos, “Image segmentation using deep learning: A survey, " IEEE transactions on pattern analysis and machine intelligence, vol. 44(7), 2021, pp. 3523-3542, 10.1109/TPAMI.2021.3059968.
P. Malík, Š. Krištofík K. Knapová, “Instance segmentation model created from three semantic segmentations of mask, boundary and centroid Pixels verified on GlaS dataset, " In 2020 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 569-576, http://dx.doi.org/10.15439/2020F175.
L. Ming, Y. Qingbo, L. Mingyu, “Retinal blood vessel segmentation based on multi-scale deep learning, " In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1-7, http://dx.doi.org/10.15439/2018F127
L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A.I. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, " IEEE transactions on pattern analysis and machine intelligence, vol. 40(4), 2017, pp. 834-848, https://doi.org/10.1109/TPAMI.2017.2699184.
V. Badrinarayanan, A. Kendall and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation, " IEEE transactions on pattern analysis and machine intelligence, vol. 39(12), 2017, pp. 2481-2495, https://doi.org/10.1109/TPAMI.2016.2644615.
M. Siam, S. Elkerdawy, M. Jagersand and S. Yogamani, “Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges, " In 2017 IEEE 20th international conference on intelligent transportation systems (ITSC), pp. 1-8, https://doi.org/10.1109/ITSC.2017.8317714.
Z. W. Hong, C. Yu-Ming, S. Y. Su, T. Y. Shann, Y. H. Chang, H. K. Yang, ldots & C. Y. Lee, “Virtual-to-real: Learning to control in visual semantic segmentation, " arXiv preprint, 2018, 1802.00285, https://doi.org/10.48550/arXiv.1802.00285.
A. Krizhevsky, I. Sutskever, G.E. Hinton, “Imagenet classification with deep convolutional neural networks, " Communications of the ACM, 2017, vol. 60(6), pp. 84-90, https://doi.org/10.1145/3065386.
J. Long, E. Shelhamer, T. Darrell, “Fully convolutional networks for semantic segmentation, " In Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440, https://doi.org/10.1109/CVPR.2015.7298965.
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, B. Schiele, “The cityscapes dataset for semantic urban scene understanding, " In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213-3223, https://doi.org/10.1109/CVPR.2016.350.
A. G. Howard, Z. Menglong, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications, " CoRR, 2017, abs/1704.04861, https://doi.org/10.48550/arXiv.1704.04861.
M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, L.C. Chen, “Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation, " CoRR, 2018, abs/1801.04381.
K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition, " CoRR, 2015, abs/1512.03385, https://doi.org/10.1109/CVPR.2016.90.
K. He, X. Zhang, S. Ren, J. Sun, “Identity mappings in deep residual networks, " CoRR, 2016, abs/1603.05027, https://doi.org/10.1007/978-3-319-46493-0_38.
O. Ronneberger, P. Fischer, T. Brox, “U-net: Convolutional networks for biomedical image segmentation, " In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, Part III 18, pp. 234-241, https://doi.org/10.1007/978-3-319-24574-4_28.
O. Rukundo, H. Cao, “Nearest neighbor value interpolation, " arXiv preprint, 2012, 3:25:30, https://doi.org/10.14569/IJACSA.2012.030405.
R. Takahashi, T. Matsubara, K. Uehara, “Data augmentation using random image cropping and patching for deep CNNs, " IEEE Transactions on Circuits and Systems for Video Technology, 2019, vol. 30(9), pp. 2917-2931, https://doi.org/10.1109/TCSVT.2019.2935128.
V. Nair, G. E. Hinton, “Rectified linear units improve restricted boltzmann machines, " In Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814.
E. R. De Rezende, G. C. Ruppert, A. Theophilo, E. K. Tokuda, T. Carvalho, “Exposing computer generated images by using deep convolutional neural networks. Signal Processing," Image Communication, 2018, vol. 66, pp. 113-126, https://doi.org/10.1016/j.image.2018. 04.006.