DICKT—Deep Learning-Based Image Captioning using Keras and TensorFlow

Phung Thao Vi; Satyam Mishra; Le Anh Ngoc; Sundaram Mishra; Vu Minh Phuc

DICKT—Deep Learning-Based Image Captioning using Keras and TensorFlow

Phung Thao Vi, Satyam Mishra, Le Anh Ngoc, Sundaram Mishra, Vu Minh Phuc

DOI: http://dx.doi.org/10.15439/2023R55

Citation: Proceedings of the 2023 Eighth International Conference on Research in Intelligent Computing in Engineering, Pradeep Kumar, Manuel Cardona, Vijender Kumar Solanki, Tran Duc Tan, Abdul Wahid (eds). ACSIS, Vol. 38, pages 105–110 (2023)

Full text

Abstract. This study evaluates a caption generation model's performance using the BLEU Score metric. The model generates descriptions for images, compared to reference captions with single and dual references. Results show a high BLEU Score, suggesting human-like captions. However, BLEU primarily measures linguistic similarity and n-gram overlap, missing full human-generated caption richness. The findings reveal the model's potential to convey image essence in text, but highlight BLEU Score limitations. TensorFlow and Keras are used for model development, acknowledging their widespread use but also their limitations. The research offers insights into caption generation model capabilities and urges a broader perspective on caption quality beyond quantitative metrics. While higher BLEU Scores are generally preferred, a``good'' score varies with dataset and context. The study emphasizes a need for a more comprehensive approach to assess the quality and creativity of machine-generated captions.

References

“Overview of Image Caption Generators and Its Applications | SpringerLink.” Accessed: Oct. 05, 2023. [Online]. Available: https://link.springer.com/chapter/10.1007/978-981-19-0863-7_8
MD. Z. Hossain, F. Sohel, M. F. Shiratuddin, and H. Laga, “A Comprehensive Survey of Deep Learning for Image Captioning,” ACM Comput. Surv., vol. 51, no. 6, p. 118:1-118:36, Tháng Hai 2019, http://dx.doi.org/10.1145/3295748.
J.-H. Huang, T.-W. Wu, and M. Worring, “Contextualized Keyword Representations for Multi-modal Retinal Image Captioning,” in Proceedings of the 2021 International Conference on Multimedia Retrieval, in ICMR ’21. New York, NY, USA: Association for Computing Machinery, Tháng Chín 2021, pp. 645–652. http://dx.doi.org/10.1145/3460426.3463667.
S. Mishra, C. S. Minh, H. Thi Chuc, T. V. Long, and T. T. Nguyen, “Automated Robot (Car) using Artificial Intelligence,” in 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), Jan. 2022, pp. 319–324. http://dx.doi.org/10.1109/ISMODE53584.2022.9743130.
“SATMeas - Object Detection and Measurement: Canny Edge Detection Algorithm | SpringerLink.” Accessed: Apr. 19, 2023. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-23504-7_7
“Integrating State-of-the-Art Face Recognition and Anti-Spoofing Techniques into Enterprise Information Systems | SpringerLink.” Accessed: Oct. 05, 2023. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-45140-9_7
“Image Captioning for Information Generation | IEEE Conference Publication | IEEE Xplore.” Accessed: Oct. 03, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/10128347
D. Beddiar, M. Oussalah, and S. Tapio, “Explainability for Medical Image Captioning,” in 2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA), Apr. 2022, pp. 1–6. http://dx.doi.org/10.1109/IPTA54936.2022.9784146.
N. Wang et al., “Efficient Image Captioning for Edge Devices.” arXiv, Dec. 17, 2022. http://dx.doi.org/10.48550/arXiv.2212.08985.
V. Atliha and D. Šešok, “Image-Captioning Model Compression,” Appl. Sci., vol. 12, no. 3, Art. no. 3, Jan. 2022, http://dx.doi.org/10.3390/app12031638.
“Image Captioning Using Deep Learning | IEEE Conference Publication | IEEE Xplore.” Accessed: Oct. 03, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/9740788
S. Chakraborty, “Captioning Image Using Deep Learning: A Novel Approach,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 11, no. 6, pp. 3468–3472, Jun. 2023, http://dx.doi.org/10.22214/ijraset.2023.54297.
A. Sen, “Captioning Image Using Deep Learning Approach,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 11, no. 5, pp. 7425–7428, May 2023, http://dx.doi.org/10.22214/ijraset.2023.53389.
Channasandra, Bangalore, India., N. R. U. S, M. R, and Professor, Department of Computer Science and Engineering RNS Institute of Technology, “IMAGE CAPTIONING: NOW EASILY DONE BY USING DEEP LEARNING MODELS,” Int. J. Comput. Algorithm, vol. 12, no. 1, Jun. 2023, http://dx.doi.org/10.20894/IJCOA.101.012.001.001.
N. Goel, A. Arora, P. Kashyap, and S. Varshney, “An Analysis of Image Captioning Models using Deep Learning,” in 2023 International Conference on Disruptive Technologies (ICDT), May 2023, pp. 131–136. http://dx.doi.org/10.1109/ICDT57929.2023.10151421.
“Deep Image Captioning: An Overview | IEEE Conference Publication | IEEE Xplore.” Accessed: Oct. 03, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/8756821
“[1906.05963] Image Captioning: Transforming Objects into Words.” Accessed: Oct. 03, 2023. [Online]. Available: https://arxiv.org/abs/1906.05963
J. Pavlopoulos, V. Kougia, and I. Androutsopoulos, “A Survey on Biomedical Image Captioning,” in Proceedings of the Second Workshop on Shortcomings in Vision and Language, Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 26–36. http://dx.doi.org/10.18653/v1/W19-1803.
“[1905.13302] A Survey on Biomedical Image Captioning.” Accessed: Oct. 03, 2023. [Online]. Available: https://arxiv.org/abs/1905.13302
L. Panigrahi, R. R. Panigrahi, and S. K. Chandra, “Hybrid Image Captioning Model,” in 2022 OPJU International Technology Conference on Emerging Technologies for Sustainable Development (OTCON), Feb. 2023, pp. 1–6. http://dx.doi.org/10.1109/OTCON56053.2023.10113957.