Geometry-Aware Keypoint Network: Accurate Prediction of Point Features in Challenging Scenario

Tomasz Nowak; Piotr Skrzypczyński

Geometry-Aware Keypoint Network: Accurate Prediction of Point Features in Challenging Scenario

Tomasz Nowak, Piotr Skrzypczyński

DOI: http://dx.doi.org/10.15439/2022F145

Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 191–200 (2022)

Full text

Abstract. In this paper, we consider a challenging scenario of localising a camera with respect to a charging station for electric buses. In this application, we face a number of problems, including a substantial scale change as the bus approaches the station, and the need to detect keypoints on a weakly textured object in a wide range of lighting and weather conditions. Therefore, we use a deep convolutional neural network to detect the features, while retaining a conventional procedure for pose estimation with 2D-to-3D associations. We leverage here the backbone of HRNet, a state-of-the-art network used for detection of feature points in human pose recognition, and we further improve the solution adding constraints that stem from the known scene geometry. We incorporate the reprojection-based geometric priors in a novel loss function for HRNet training and use the object geometry to construct sanity checks in post-processing. Moreover, we demonstrate that our Geometry-Aware Keypoint Network yields feasible estimates of the geometric uncertainty of point features. The proposed architecture and solutions are tested on a large dataset of images and trajectories collected with a real city bus and charging station under varyingenvironmental conditions.

References

R. I. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge University Press, 2004.
M. M. Michalek, T. Gawron, M. Nowicki, and P. Skrzypczynski, “Precise docking at charging stations for large-capacity vehicles: An advanced driver-assistance system for drivers of electric urban buses,” IEEE Vehicular Technology Magazine, vol. 16, no. 3, pp. 57–65, 2021.
T. Nowak, M. Nowicki, K. Ćwian, and P. Skrzypczyński, “How to improve object detection in a driver assistance system applying explainable deep learning,” in IEEE Intelligent Vehicles Symposium, Paris, 2019, pp. 226–231.
——, “Leveraging object recognition in reliable vehicle localization from monocular images,” in Automation 2020: Towards Industry of the Future, ser. AISC, vol. 1140. Cham: Springer, 2020, pp. 195–205.
J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, W. Liu, and B. Xiao, “Deep high-resolution representation learning for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, pp. 3349–3364, 2021.
M. Toshpulatov, W. Lee, S. Lee, and A. Haghighian Roudsari, “Human pose, hand and mesh estimation using deep learning: a survey,” The Journal of Supercomputing, vol. 78, no. 6, pp. 7616–7654, 2022.
L. G. Clarembaux, J. Pérez, D. Gonzalez, and F. Nashashibi, “Perception and control strategies for autonomous docking for electric freight vehicles,” Transportation Research Procedia, vol. 14, pp. 1516–1522, 2016, transport Research Arena TRA2016.
E. Marchand, F. Spindler, and F. Chaumette, “ViSP for visual servoing: a generic software platform with a wide class of robot control skills,” IEEE Robotics and Automation Magazine, pp. 40–52, 2005.
C. Campos, R. Elvira, J. J. G. Rodrı́guez, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM3: an accurate open-source library for visual, visual-inertial, and multimap SLAM,” IEEE Trans. Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
K. L. Lim and T. Bräunl, “A review of visual odometry methods and its applications for autonomous driving,” arXiv, vol. 2009.09193, 2020.
J. Miseikis, M. Ruther, B. Walzel, M. Hirz, and H. Brunner, “3d vision guided robotic charging station for electric and plug-in hybrid vehicles,” arXiv, vol. 1703.05381, 2017.
J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” International Journal of Computer Vision, vol. 129, no. 1, pp. 23–79, 2021.
D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Selfsupervised interest point detection and description,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2018, 2018, pp. 224–236.
V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate o(n) solution to the pnp problem,” International Journal of Computer Vision, vol. 81, no. 2, 2008.
B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment — a modern synthesis,” in Vision Algorithms: Theory and Practice. Berlin, Heidelberg: Springer, 2000, pp. 298–372.
T. Sattler, C. Sweeney, and M. Pollefeys, “On sampling focal length values to solve the absolute pose problem,” in Computer Vision – ECCV 2014. Cham: Springer, 2014, pp. 828–843.
A. Kendall, M. Grimes, and R. Cipolla, “Posenet: A convolutional network for real-time 6-dof camera relocalization,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2938–2946.
Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes,” in Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania, June 2018.
A. Kendall and R. Cipolla, “Geometric loss functions for camera pose regression with deep learning,” in IEEE Conference on Computer Vision and Pattern Recognition, (CVPR), 2017, pp. 6555–6564.
T. Sattler, Q. Zhou, M. Pollefeys, and L. Leal-Taixé, “Understanding the limitations of cnn-based absolute camera pose regression,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3297–3307.
B. Zhuang and M. Chandraker, “Fusing the old with the new: Learning relative camera pose with geometry-guided uncertainty,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 32–42.
Y.-Y. Jau, R. Zhu, H. Su, and M. Chandraker, “Deep keypoint-based camera pose estimation with geometric constraints,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 4950–4957.
K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “LIFT: learned invariant feature transform,” in Computer Vision - ECCV 2016 - 14th European Conference, Proceedings, Part VI, ser. LNCS, vol. 9910. Springer, 2016, pp. 467–483.
M. J. Tyszkiewicz, P. Fua, and E. Trulls, “DISK: learning local features with policy gradient,” in Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.
G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. P. Murphy, “Towards accurate multi-person pose estimation in the wild,” arXiv, vol. 1701.01779, 2017.
J.-J. Liu, Q. Hou, M.-M. Cheng, C. Wang, and J. Feng, “Improving convolutional networks with self-calibrated convolutions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
A. Kendall and R. Cipolla, “Modelling uncertainty in deep learning for camera relocalization,” in IEEE International Conference on Robotics and Automation, (ICRA), 2016, pp. 4762–4769.
A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” arXiv, vol. 1703.04977, 2017.
A. Kumar, T. K. Marks, W. Mou, C. Feng, and X. Liu, “Uglli face alignment: Estimating uncertainty with gaussian log-likelihood loss,” in IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 778–782.
MMPose, “Openmmlab pose estimation toolbox and benchmark,” https://github.com/open-mmlab/mmpose, 2020.
J. Huang, Z. Zhu, and F. Guo, “The devil is in the details: Delving into unbiased data processing for human pose estimation,” arXiv, vol. 2008.07139, 2020.
F. Zhang, X. Zhu, H. Dai, M. Ye, and C. Zhu, “Distribution-aware coordinate representation for human pose estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
M. Branch, T. Coleman, and Y. Li, “A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems,” SIAM J. Sci. Comput., vol. 21, pp. 1–23, 1999.
M. R. Nowicki, “A data-driven and application-aware approach to sensory system calibration in an autonomous vehicle,” Measurement, vol. 194, p. 111002, 2022.
H. Wang, Z. Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel, and X. Hu, “Score-CAM: score-weighted visual explanations for convolutional neural networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020.