Logo PTI Logo FedCSIS

Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS)

Annals of Computer Science and Information Systems, Volume 39

Task-driven single-image super-resolution reconstruction of document scan

,

DOI: http://dx.doi.org/10.15439/2024F7855

Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 259264 ()

Full text

Abstract. Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.

References

  1. F. Jelowicki, “Enhancing image quality through automated projector stacking,” in Communication Papers of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023, Warsaw, Poland, September 17-20, 2023, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, M. Paprzycki, and D. Slezak, Eds., vol. 37, 2023, pp. 153–156. [Online]. Available: https://doi.org/10.15439/2023F9900
  2. W. Yang, X. Zhang, Y. Tian, W. Wang, J. Xue, and Q. Liao, “Deep learning for single image super-resolution: A brief review,” IEEE Transaction on Multimedia, vol. 21, no. 12, pp. 3106–3121, Dec 2019. [Online]. Available: https://doi.org/10.1109/TMM.2019.2919431
  3. H. Chen, X. He, L. Qing, Y. Wu, C. Ren, R. E. Sheriff, and C. Zhu, “Real-world single image super-resolution: A brief review,” Information Fusion, vol. 79, pp. 124–145, 2022. [Online]. Available: https://doi.org/10.1016/j.inffus.2021.09.005
  4. L. Yue, H. Shen, J. Li, Q. Yuan, H. Zhang, and L. Zhang, “Image super-resolution: The techniques, applications, and future,” Signal Processing, vol. 128, pp. 389–408, 2016. [Online]. Available: https://doi.org/10.1016/j.sigpro.2016.05.002
  5. T. Tarasiewicz, J. Nalepa, R. A. Farrugia, G. Valentino, M. Chen, J. A. Briffa, and M. Kawulok, “Multitemporal and multispectral data fusion for super-resolution of Sentinel-2 images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–19, 2023. [Online]. Available: https://doi.org/10.1109/TGRS.2023.3311622
  6. W. Wang, E. Xie, X. Liu, W. Wang, D. Liang, C. Shen, and X. Bai, “Scene text image super-resolution in the wild,” in Proc. IEEE/CVF ECCV. Springer, 2020, pp. 650–666. [Online]. Available: https://doi.org/10.1007/978-3-030-58607-2 38
  7. T. Balon, M. Knapik, and B. Cyganek, “Real-time detection of small objects in automotive thermal images with modern deep neural architectures,” in Communication Papers of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023, Warsaw, Poland, September 17-20, 2023, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, M. Paprzycki, and D. Slezak, Eds., vol. 37, 2023, pp. 29–35. [Online]. Available: https://doi.org/10.15439/2023F8409
  8. J. Cai, S. Gu, R. Timofte, and L. Zhang, “NTIRE 2019 Challenge on real image super-resolution: Methods and results,” in Proc. IEEE/CVF CVPR, 2019, pp. 1–13. [Online]. Available: https://doi.org/10.1109/CVPRW.2019.00274
  9. M. Märtens, D. Izzo, A. Krzic, and D. Cox, “Super-resolution of PROBA-V images using convolutional neural networks,” Astrodynamics, vol. 3, no. 4, pp. 387–402, 2019. [Online]. Available: https://doi.org/10.1007/s42064-019-0059-8
  10. P. Kowaleczko, T. Tarasiewicz, M. Ziaja, D. Kostrzewa, J. Nalepa, P. Rokita, and M. Kawulok, “A real-world benchmark for Sentinel-2 multi-image super-resolution,” Scientific Data, vol. 10, no. 1, p. 644, 2023. [Online]. Available: https://doi.org/10.1038/s41597-023-02538-9
  11. Z. Guo, G. Wu, X. Song, W. Yuan, Q. Chen, H. Zhang, X. Shi, M. Xu, Y. Xu, R. Shibasaki et al., “Super-resolution integrated building semantic segmentation for multi-source remote sensing imagery,” IEEE Access, vol. 7, pp. 99 381–99 397, 2019. [Online]. Available: https://doi.org/10.1109/ACCESS.2019.2928646
  12. M. Haris, G. Shakhnarovich, and N. Ukita, “Task-driven super resolution: Object detection in low-resolution images,” in Proc. ICONIP. Springer, 2021, pp. 387–395. [Online]. Available: https://doi.org/10.1007/978-3-030-92307-5 45
  13. T. Balon, M. Knapik, and B. Cyganek, “New thermal automotive dataset for object detection,” in Position Papers of the 17th Conference on Computer Science and Intelligence Systems, FedCSIS 2022, Sofia, Bulgaria, September 4-7, 2022, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, M. Paprzycki, and D. Slezak, Eds., vol. 31, 2022, pp. 43–48. [Online]. Available: https://doi.org/10.15439/2022F283
  14. X. Yang, W. Wu, K. Liu, P. W. Kim, A. K. Sangaiah, and G. Jeon, “Long-distance object recognition with image super resolution: A comparative study,” IEEE Access, vol. 6, pp. 13 429–13 438, 2018. [Online]. Available: https://doi.org/10.1109/ACCESS.2018.2799861
  15. M. Wlodarczyk-Sielicka and D. Polap, “Interpolation merge as augmentation technique in the problem of ship classification,” in Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, FedCSIS 2020, Sofia, Bulgaria, September 6-9, 2020, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, and M. Paprzycki, Eds., vol. 21, 2020, pp. 443–446. [Online]. Available: https://doi.org/10.15439/2020F11
  16. H. Liu, Z. Ruan, P. Zhao, C. Dong, F. Shang, Y. Liu, L. Yang, and R. Timofte, “Video super-resolution based on deep learning: a comprehensive survey,” Artificial Intelligence Review, pp. 1–55, 2022. [Online]. Available: https://doi.org/10.1007/s10462-022-10147-y
  17. G. Bhat, M. Danelljan, L. Van Gool, and R. Timofte, “Deep burst super-resolution,” in Proc. IEEE/CVF CVPR, 2021, pp. 9209–9218. [Online]. Available: https://doi.org/10.1109/CVPR46437.2021.00909
  18. Z. Wang, J. Chen, and S. C. H. Hoi, “Deep learning for image super-resolution: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3365–3387, 2021. [Online]. Available: https://doi.org/10.1109/TPAMI.2020.2982166
  19. R. Abiantun, F. Juefei-Xu, U. Prabhu, and M. Savvides, “SSR2: Sparse signal recovery for single-image super-resolution on faces with extreme low resolutions,” Pattern Recognition, vol. 90, pp. 308–324, 2019. [Online]. Available: https://doi.org/10.1016/j.patcog.2019.01.032
  20. C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in Proc. IEEE/CVF ECCV. Springer, 2014, pp. 184–199. [Online]. Available: https://doi.org/10.1007/978-3-319-10593-2 13
  21. C. Dong, C. C. Loy, and X. Tang, “Accelerating the superresolution convolutional neural network,” in Proc. IEEE/CVF ECCV. Springer, 2016, pp. 391–407. [Online]. Available: https://doi.org/10.1007/978-3-319-46475-6 25
  22. Y. Huang, J. Li, X. Gao, Y. Hu, and W. Lu, “Interpretable detail-fidelity attention network for single image super-resolution,” IEEE Transactions on Image Processing, vol. 30, pp. 2325–2339, 2021. [Online]. Available: https://doi.org/10.1109/TIP.2021.3050856
  23. J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proc. IEEE/CVF CVPR, 2016, pp. 1646–1654. [Online]. Available: https://doi.org/10.1109/CVPR.2016.182
  24. W. Lai, J. Huang, N. Ahuja, and M. Yang, “Fast and accurate image super-resolution with deep Laplacian pyramid networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 41, no. 11, pp. 2599–2613, 2019. [Online]. Available: https://doi.org/10.1109/TPAMI.2018.2865304
  25. B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proc. IEEE/CVF CVPR Workshops, 2017, pp. 136–144. [Online]. Available: https://doi.org/10.1109/CVPRW.2017.151
  26. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proc. IEEE/CVF CVPR, 2017, pp. 4681–4690. [Online]. Available: https://doi.org/10.1109/CVPR.2017.19
  27. M. Ayazoglu, “Extremely lightweight quantization robust real-time single-image super resolution for mobile devices,” in Proc. IEEE/CVF CVPR, 2021, pp. 2472–2479. [Online]. Available: https://doi.org/10.1109/CVPRW53098.2021.00280
  28. Z. Lu, J. Li, H. Liu, C. Huang, L. Zhang, and T. Zeng, “Transformer for single image super-resolution,” in Proc. IEEE/CVF CVPR, 2022, pp. 457–466. [Online]. Available: https://doi.org/10.1109/CVPRW56347. 2022.00061
  29. C. Dong, X. Zhu, Y. Deng, C. C. Loy, and Y. Qiao, “Boosting optical character recognition: A super-resolution approach,” arXiv preprint https://arxiv.org/abs/1506.02211, 2015. [Online]. Available: https://doi.org/10.48550/arXiv.1506.02211
  30. R. K. Pandey and A. Ramakrishnan, “Efficient document-image super-resolution using convolutional neural network,” Sādhanā, vol. 43, pp. 1–6, 2018. [Online]. Available: https://doi.org/10.1007/s12046-018-0794-1
  31. W. Wang, E. Xie, P. Sun, W. Wang, L. Tian, C. Shen, and P. Luo, “TextSR: Content-aware text super-resolution guided by recognition,” arXiv preprint https://arxiv.org/abs/1909.07113, 2019. [Online]. Available: https://doi.org/10.48550/arXiv.1909.07113
  32. J. Cai, H. Zeng, H. Yong, Z. Cao, and L. Zhang, “Toward real-world single image super-resolution: A new benchmark and a new model,” in Proc. IEEE ICCV, 2019. [Online]. Available: https://doi.org/10.1109/ICCV.2019.00318
  33. J. Chen, B. Li, and X. Xue, “Scene text telescope: Text-focused scene image super-resolution,” in Proc. IEEE/CVF CVPR, 2021, pp. 12 026– 12 035. [Online]. Available: https://doi.org/10.1109/CVPR46437.2021. 01185
  34. J. Chen, H. Yu, J. Ma, B. Li, and X. Xue, “Text Gestalt: Stroke-aware scene text image super-resolution,” in Proc. AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 285–293. [Online]. Available: https://doi.org/10.1609/aaai.v36i1.19904
  35. J. Ma, S. Guo, and L. Zhang, “Text prior guided scene text image super-resolution,” IEEE Transactions on Image Processing, vol. 32, pp. 1341–1353, 2023. [Online]. Available: https://doi.org/10.1109/TIP.2023.3237002
  36. T. Frizza, D. G. Dansereau, N. M. Seresht, and M. Bewley, “Semantically accurate super-resolution generative adversarial networks,” Computer Vision and Image Understanding, p. 103464, 2022. [Online]. Available: https://doi.org/10.1016/j.cviu.2022.103464
  37. M. S. Rad, B. Bozorgtabar, C. Musat, U.-V. Marti, M. Basler, H. K. Ekenel, and J.-P. Thiran, “Benefiting from multitask learning to improve single image super-resolution,” Neurocomputing, vol. 398, pp. 304–313, 2020. [Online]. Available: https://doi.org/10.1016/j.neucom.2019.07.107
  38. Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in Proc. IEEE/CVF ECCV. Springer, 2016, pp. 56–72. [Online]. Available: https://doi.org/10.1007/978-3-319-46484-8 4
  39. S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, “Multi-task learning for dense prediction tasks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3614–3633, 2021. [Online]. Available: https://doi.org/10.1109/TPAMI.2021.3054719
  40. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proc. IEEE/CVF ECCV. Springer, 2014, pp. 740–755. [Online]. Available: https://doi.org/10.1007/978-3-319-10602-1 48
  41. G. Lazzara and T. Géraud, “Efficient multiscale sauvola’s binarization,” International Journal on Document Analysis and Recognition (IJDAR), vol. 17, no. 2, pp. 105–123, 2014. [Online]. Available: https://doi.org/10.1007/s10032-013-0209-0
  42. R. Gomez, B. Shi, L. Gomez, L. Numann, A. Veit, J. Matas, S. Belongie, and D. Karatzas, “ICDAR2017 robust reading challenge on COCO-text,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1. IEEE, 2017, pp. 1435–1443. [Online]. Available: https://doi.org/10.1109/ICDAR.2017.234
  43. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE/CVF CVPR, 2018. [Online]. Available: https://doi.org/10.1109/CVPR.2018.00068