Estimation of absolute distance and height of people based on monocular view and deep neural networks for edge devices operating in the visible and thermal spectra

Jan Gąsienica-Józkowy; Bogusław Cyganek; Mateusz Knapik; Szymon Głogowski; Łukasz Przebinda

Estimation of absolute distance and height of people based on monocular view and deep neural networks for edge devices operating in the visible and thermal spectra

Jan Gąsienica-Józkowy, Bogusław Cyganek, Mateusz Knapik, Szymon Głogowski, Łukasz Przebinda

DOI: http://dx.doi.org/10.15439/2023F3560

Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 503–511 (2023)

Full text

Abstract. Accurate estimation of absolute distance and height of objects in open area conditions is a significant challenge. In this paper, we address these problems and we propose a novel approach that combines classical computer vision algorithms with modern neural network-based solutions. Our method integrates object detection, monocular depth estimation, and homography- based mapping to achieve precise and efficient estimations of absolute height and distance. The solution is implemented on the edge device, which enables real-time data processing using both visual and thermography data sources. Experimental evaluation on a height estimation dataset prepared by us demonstrates an accuracy of 97.06\% and validates the effectiveness of our approach.

References

G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, NanoCode012, Y. Kwon, TaoXie, J. Fang, imyhxy, K. Michael, Lorna, A. V, D. Montes, J. Nadar, Laughing, tkianai, yxNONG, P. Skalski, Z. Wang, A. Hogan, C. Fati, L. Mammana, AlexWang1900, D. Patel, D. Yiwei, F. You, J. Hajek, L. Diaconu, and M. T. Minh, “ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference,” Feb. 2022. [Online]. Available: https://doi.org/10.5281/zenodo.6222936
R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” 2021.
B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze, “Levit: a vision transformer in convnet’s clothing for faster inference,” CoRR, vol. abs/2104.01136, 2021. [Online]. Available: https://arxiv.org/abs/2104.01136
S. S. A. Zaidi, M. S. Ansari, A. Aslam, N. Kanwal, M. N. Asghar, and B. Lee, “A survey of modern deep learning based object detection models,” CoRR, vol. abs/2104.11892, 2021. [Online]. Available: https://arxiv.org/abs/2104.11892
J. Gąsienica-Józkowy, M. Knapik, and B. Cyganek, “An ensemble deep learning method with optimized weights for drone-based water rescue and surveillance,” Integrated Computer-Aided Engineering, vol. 28, pp. 221–235, 2021, 3.
M. Knapik and B. Cyganek, “Driver’s fatigue recognition based on yawn detection in thermal images,” Neurocomputing, vol. 338, pp. 274–292, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231219302280
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” 2014.
R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” 2016.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” 2016.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot MultiBox detector,” in Computer Vision – ECCV 2016. Springer International Publishing, 2016, pp. 21–37. [Online]. Available: https://doi.org/10.1007%2F978-3-319-46448-0_2
M. Knapik and B. Cyganek, “Fast eyes detection in thermal images,” Multimedia Tools and Applications, vol. 80, no. 3, pp. 3601–3621, Jan 2021. [Online]. Available: https://doi.org/10.1007/s11042-020-09403-6
J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525, 2017.
——, “Yolov3: An incremental improvement,” ArXiv, vol. abs/1804.02767, 2018.
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” 2020.
R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. New York, NY, USA: Cambridge University Press, 2003.
B. Cyganek and J. Siebert, “An introduction to 3d computer vision techniques and algorithms,” pp. 459–474, 01 2009.
R. Szeliski, “Image alignment and stitching: A tutorial,” Found. Trends. Comput. Graph. Vis., vol. 2, no. 1, p. 1–104, jan 2006. [Online]. Available: https://doi.org/10.1561/0600000009
J. Michels, A. Saxena, and A. Y. Ng, “High speed obstacle avoidance using monocular vision and reinforcement learning,” in Proceedings of the 22nd International Conference on Machine Learning, ser. ICML ’05. New York, NY, USA: Association for Computing Machinery, 2005, p. 593–600. [Online]. Available: https://doi.org/10.1145/1102351.1102426
A. Saxena, S. Chung, and A. Ng, “Learning depth from single monocular images,” Advances in neural information processing systems, vol. 18, 2005.
D. Hoiem, A. A. Efros, and M. Hebert, “Automatic photo pop-up,” ACM Trans. Graph., vol. 24, pp. 577–584, 2005.
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” 2014.
I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” 2016.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2015.
J.-H. Lee and C.-S. Kim, “Single-image depth estimation using relative depths,” Journal of Visual Communication and Image Representation, vol. 84, p. 103459, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1047320322000190
R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” 2020.
F. Yin and S. Zhou, “Accurate estimation of body height from a single depth image via a four-stage developing network,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8264–8273.
D.-s. Lee, J.-s. Kim, S. C. Jeong, and S.-k. Kwon, “Human height estimation by color deep learning and depth 3d conversion,” Applied Sciences, vol. 10, no. 16, 2020. [Online]. Available: https://www.mdpi.com/2076-3417/10/16/5531
P. Alphonse and K. Sriharsha, “Depth estimation from a single rgb image using target foreground and background scene variations,” Computers Electrical Engineering, vol. 94, p. 107349, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0045790621003207
L. Mou and X. X. Zhu, “Im2height: Height estimation from single monocular imagery via fully residual convolutional-deconvolutional network,” 2018.
M. sp. z o.o., “Myled sp. z o.o.” 2021, accessed on 05-22-2023. [Online]. Available: https://myled.pl/
C. Zheng, W. Wu, C. Chen, T. Yang, S. Zhu, J. Shen, N. Kehtarnavaz, and M. Shah, “Deep learning-based human pose estimation: A survey,” 2022.
A. M. Hafiz and G. M. Bhat, “A survey on instance segmentation: state of the art,” International Journal of Multimedia Information Retrieval, vol. 9, no. 3, pp. 171–189, jul 2020. [Online]. Available: https://doi.org/10.1007%2Fs13735-020-00195-x