Enhancing YOLOv11 for Real-Time Object Detection: Advanced Architectures and Edge-Optimized Training Pipeline

Sivadi Balakrishna; Shivani Yadao; Vijender Kumar Solanki

Enhancing YOLOv11 for Real-Time Object Detection: Advanced Architectures and Edge-Optimized Training Pipeline

Sivadi Balakrishna, Shivani Yadao, Vijender Kumar Solanki

DOI: http://dx.doi.org/10.15439/2024R115

Citation: Proceedings of the 2024 Ninth International Conference on Research in Intelligent Computing in Engineering, Vijender Kumar Solanki, Tran Duc Tan, Pradeep Kumar, Manuel Cardona (eds). ACSIS, Vol. 42, pages 89–96 (2024)

Full text

Abstract. In this paper, we propose novel enhancements to YOLOv11, leveraging its advanced architectural components such as the C3k2 block, SPPF (Spatial Pyramid Pooling - Fast), and C2PSA (Convolutional Block with Parallel Spatial Attention). These innovations address key challenges in real-time object detection, including feature extraction, attention mechanisms, and computational efficiency. Furthermore, we present a new training pipeline that optimizes YOLOv11 for edge computing while maintaining state-of-the-art accuracy. Experimental results on the COCO dataset demonstrate significant improvements in mean Average Precision (mAP) and latency compared to prior YOLO iterations, establishing YOLOv11 as a benchmark for real-time applications.

References

Redmon, J., et al. "You Only Look Once: Unified, Real-Time Object Detection." CVPR, 2016.
Bochkovskiy, A., et al. "YOLOv4: Optimal Speed and Accuracy of Object Detection." arXiv, 2020.
Sivadi Balakrishna and Vijender Kumar Solanki “RTPD-YOLO: Reconciling YoLo-V8 Model for Real-Time Potholes Detection”, in International Conference on Machine learning and Applied Network Technologies (ICMLANT 2024) is organized by IEEE El Salvador Section, pp. 1-6, Dec 13-14, 2024.
Sivadi Balakrishna “D-ACSM: a technique for dynamically assigning and adjusting cluster patterns for IoT data analysis”, The Journal of Supercomputing, Springer, ISSN 1319-1578, Vol. 78, Issue 10, pp, 12873-12897, Mach 2022. https://doi.org/10.1007/s11227-022-04427-1
Balakrishna, Sivadi, and Ahmad Abubakar Mustapha. "Progress in multi-object detection models: a comprehensive survey." Multimedia Tools and Applications 82, no. 15 (2023): 22405-22439.
Balakrishna, Sivadi, Yerrakula Gopi, and Vijender Kumar Solanki. "Comparative analysis on deep neural network models for detection of cyberbullying on Social Media." Ingeniería Solidaria 18, no. 1 (2022): 1-33.
Balakrishna, Sivadi, Moorthy Thirumaran, and Vijender Solanki. "Machine Learning based Improved Gaussian Mixture Model for IoT Real-Time: Data Analysis." Ingeniería Solidaria 16, no. 1 (2020): 1-30.
Suvarna, Buradagunta, and Sivadi Balakrishna. "Enhanced content-based fashion recommendation system through deep ensemble classifier with transfer learning." Fashion and Textiles 11, no. 1 (2024): 24.
Balakrishna, Sivadi, M. Thirumaran, R. Padmanaban, and Vijender Kumar Solanki. "An efficient incremental clustering based improved K-Medoids for IoT multivariate data cluster analysis." Peer-to-Peer Networking and Applications 13, no. 4 (2020): 1152-1175.
Balakrishna, Sivadi, Vijender Kumar Solanki, and Rubén González Crespo. "Generative AI for Smart Data Analytics." In Generative AI: Current Trends and Applications, pp. 67-85. Singapore: Springer Nature Singapore, 2024.
Muhammad Hussain. Yolo-v1 to yolo-v8, the rise of yolo and its complementary nature toward digital manufacturing and industrial defect detection. Machines, 11(7):677, 2023.
Joseph Redmon and Ali Farhadi.Yolo9000: better, faster, stronger, “In Proceedings of the IEEE conference on computer vision and pattern recognition”, pages 7263–7271, 2017.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition, 2015.
Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint https://arxiv.org/abs/1804.02767, 2018.
Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao.Yolov4: Optimal speed and accuracy of object detection. arXiv preprint https://arxiv.org/abs/2004.10934, 2020.
Ultralytics. YOLOv5: A state-of-the-art real-time object detection system. https://docs.ultralytics.com, 2021.
Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, et al. Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint https://arxiv.org/abs/2209.02976, 2022.
Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao.Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors.In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7464–7475, 2023.
Mupparaju Sohan, Thotakura Sai Ram, Rami Reddy, and Ch Venkata.A review on yolov8 and its advancements.In International Conference on Data Intelligence and Cognitive Informatics, pages 529–545. Springer, 2024.
Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao.Yolov9: Learning what you want to learn using programmable gradient information.arXiv preprint https://arxiv.org/abs/2402.13616, 2024.
Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, and Guiguang Ding.Yolov10: Real-time end-to-end object detection. arXiv preprint https://arxiv.org/abs/2405.14458, 2024.
Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024.