Task-driven single-image super-resolution reconstruction of document scan
Maciej Zyrek, Michal Kawulok
DOI: http://dx.doi.org/10.15439/2024F7855
Citation: Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), M. Bolanowski, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 39, pages 259–264 (2024)
Abstract. Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.
References
- F. Jelowicki, “Enhancing image quality through automated projector stacking,” in Communication Papers of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023, Warsaw, Poland, September 17-20, 2023, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, M. Paprzycki, and D. Slezak, Eds., vol. 37, 2023, pp. 153–156. [Online]. Available: https://doi.org/10.15439/2023F9900
- W. Yang, X. Zhang, Y. Tian, W. Wang, J. Xue, and Q. Liao, “Deep learning for single image super-resolution: A brief review,” IEEE Transaction on Multimedia, vol. 21, no. 12, pp. 3106–3121, Dec 2019. [Online]. Available: https://doi.org/10.1109/TMM.2019.2919431
- H. Chen, X. He, L. Qing, Y. Wu, C. Ren, R. E. Sheriff, and C. Zhu, “Real-world single image super-resolution: A brief review,” Information Fusion, vol. 79, pp. 124–145, 2022. [Online]. Available: https://doi.org/10.1016/j.inffus.2021.09.005
- L. Yue, H. Shen, J. Li, Q. Yuan, H. Zhang, and L. Zhang, “Image super-resolution: The techniques, applications, and future,” Signal Processing, vol. 128, pp. 389–408, 2016. [Online]. Available: https://doi.org/10.1016/j.sigpro.2016.05.002
- T. Tarasiewicz, J. Nalepa, R. A. Farrugia, G. Valentino, M. Chen, J. A. Briffa, and M. Kawulok, “Multitemporal and multispectral data fusion for super-resolution of Sentinel-2 images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–19, 2023. [Online]. Available: https://doi.org/10.1109/TGRS.2023.3311622
- W. Wang, E. Xie, X. Liu, W. Wang, D. Liang, C. Shen, and X. Bai, “Scene text image super-resolution in the wild,” in Proc. IEEE/CVF ECCV. Springer, 2020, pp. 650–666. [Online]. Available: https://doi.org/10.1007/978-3-030-58607-2 38
- T. Balon, M. Knapik, and B. Cyganek, “Real-time detection of small objects in automotive thermal images with modern deep neural architectures,” in Communication Papers of the 18th Conference on Computer Science and Intelligence Systems, FedCSIS 2023, Warsaw, Poland, September 17-20, 2023, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, M. Paprzycki, and D. Slezak, Eds., vol. 37, 2023, pp. 29–35. [Online]. Available: https://doi.org/10.15439/2023F8409
- J. Cai, S. Gu, R. Timofte, and L. Zhang, “NTIRE 2019 Challenge on real image super-resolution: Methods and results,” in Proc. IEEE/CVF CVPR, 2019, pp. 1–13. [Online]. Available: https://doi.org/10.1109/CVPRW.2019.00274
- M. Märtens, D. Izzo, A. Krzic, and D. Cox, “Super-resolution of PROBA-V images using convolutional neural networks,” Astrodynamics, vol. 3, no. 4, pp. 387–402, 2019. [Online]. Available: https://doi.org/10.1007/s42064-019-0059-8
- P. Kowaleczko, T. Tarasiewicz, M. Ziaja, D. Kostrzewa, J. Nalepa, P. Rokita, and M. Kawulok, “A real-world benchmark for Sentinel-2 multi-image super-resolution,” Scientific Data, vol. 10, no. 1, p. 644, 2023. [Online]. Available: https://doi.org/10.1038/s41597-023-02538-9
- Z. Guo, G. Wu, X. Song, W. Yuan, Q. Chen, H. Zhang, X. Shi, M. Xu, Y. Xu, R. Shibasaki et al., “Super-resolution integrated building semantic segmentation for multi-source remote sensing imagery,” IEEE Access, vol. 7, pp. 99 381–99 397, 2019. [Online]. Available: https://doi.org/10.1109/ACCESS.2019.2928646
- M. Haris, G. Shakhnarovich, and N. Ukita, “Task-driven super resolution: Object detection in low-resolution images,” in Proc. ICONIP. Springer, 2021, pp. 387–395. [Online]. Available: https://doi.org/10.1007/978-3-030-92307-5 45
- T. Balon, M. Knapik, and B. Cyganek, “New thermal automotive dataset for object detection,” in Position Papers of the 17th Conference on Computer Science and Intelligence Systems, FedCSIS 2022, Sofia, Bulgaria, September 4-7, 2022, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, M. Paprzycki, and D. Slezak, Eds., vol. 31, 2022, pp. 43–48. [Online]. Available: https://doi.org/10.15439/2022F283
- X. Yang, W. Wu, K. Liu, P. W. Kim, A. K. Sangaiah, and G. Jeon, “Long-distance object recognition with image super resolution: A comparative study,” IEEE Access, vol. 6, pp. 13 429–13 438, 2018. [Online]. Available: https://doi.org/10.1109/ACCESS.2018.2799861
- M. Wlodarczyk-Sielicka and D. Polap, “Interpolation merge as augmentation technique in the problem of ship classification,” in Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, FedCSIS 2020, Sofia, Bulgaria, September 6-9, 2020, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, and M. Paprzycki, Eds., vol. 21, 2020, pp. 443–446. [Online]. Available: https://doi.org/10.15439/2020F11
- H. Liu, Z. Ruan, P. Zhao, C. Dong, F. Shang, Y. Liu, L. Yang, and R. Timofte, “Video super-resolution based on deep learning: a comprehensive survey,” Artificial Intelligence Review, pp. 1–55, 2022. [Online]. Available: https://doi.org/10.1007/s10462-022-10147-y
- G. Bhat, M. Danelljan, L. Van Gool, and R. Timofte, “Deep burst super-resolution,” in Proc. IEEE/CVF CVPR, 2021, pp. 9209–9218. [Online]. Available: https://doi.org/10.1109/CVPR46437.2021.00909
- Z. Wang, J. Chen, and S. C. H. Hoi, “Deep learning for image super-resolution: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3365–3387, 2021. [Online]. Available: https://doi.org/10.1109/TPAMI.2020.2982166
- R. Abiantun, F. Juefei-Xu, U. Prabhu, and M. Savvides, “SSR2: Sparse signal recovery for single-image super-resolution on faces with extreme low resolutions,” Pattern Recognition, vol. 90, pp. 308–324, 2019. [Online]. Available: https://doi.org/10.1016/j.patcog.2019.01.032
- C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in Proc. IEEE/CVF ECCV. Springer, 2014, pp. 184–199. [Online]. Available: https://doi.org/10.1007/978-3-319-10593-2 13
- C. Dong, C. C. Loy, and X. Tang, “Accelerating the superresolution convolutional neural network,” in Proc. IEEE/CVF ECCV. Springer, 2016, pp. 391–407. [Online]. Available: https://doi.org/10.1007/978-3-319-46475-6 25
- Y. Huang, J. Li, X. Gao, Y. Hu, and W. Lu, “Interpretable detail-fidelity attention network for single image super-resolution,” IEEE Transactions on Image Processing, vol. 30, pp. 2325–2339, 2021. [Online]. Available: https://doi.org/10.1109/TIP.2021.3050856
- J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proc. IEEE/CVF CVPR, 2016, pp. 1646–1654. [Online]. Available: https://doi.org/10.1109/CVPR.2016.182
- W. Lai, J. Huang, N. Ahuja, and M. Yang, “Fast and accurate image super-resolution with deep Laplacian pyramid networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 41, no. 11, pp. 2599–2613, 2019. [Online]. Available: https://doi.org/10.1109/TPAMI.2018.2865304
- B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proc. IEEE/CVF CVPR Workshops, 2017, pp. 136–144. [Online]. Available: https://doi.org/10.1109/CVPRW.2017.151
- C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proc. IEEE/CVF CVPR, 2017, pp. 4681–4690. [Online]. Available: https://doi.org/10.1109/CVPR.2017.19
- M. Ayazoglu, “Extremely lightweight quantization robust real-time single-image super resolution for mobile devices,” in Proc. IEEE/CVF CVPR, 2021, pp. 2472–2479. [Online]. Available: https://doi.org/10.1109/CVPRW53098.2021.00280
- Z. Lu, J. Li, H. Liu, C. Huang, L. Zhang, and T. Zeng, “Transformer for single image super-resolution,” in Proc. IEEE/CVF CVPR, 2022, pp. 457–466. [Online]. Available: https://doi.org/10.1109/CVPRW56347. 2022.00061
- C. Dong, X. Zhu, Y. Deng, C. C. Loy, and Y. Qiao, “Boosting optical character recognition: A super-resolution approach,” arXiv preprint https://arxiv.org/abs/1506.02211, 2015. [Online]. Available: https://doi.org/10.48550/arXiv.1506.02211
- R. K. Pandey and A. Ramakrishnan, “Efficient document-image super-resolution using convolutional neural network,” Sādhanā, vol. 43, pp. 1–6, 2018. [Online]. Available: https://doi.org/10.1007/s12046-018-0794-1
- W. Wang, E. Xie, P. Sun, W. Wang, L. Tian, C. Shen, and P. Luo, “TextSR: Content-aware text super-resolution guided by recognition,” arXiv preprint https://arxiv.org/abs/1909.07113, 2019. [Online]. Available: https://doi.org/10.48550/arXiv.1909.07113
- J. Cai, H. Zeng, H. Yong, Z. Cao, and L. Zhang, “Toward real-world single image super-resolution: A new benchmark and a new model,” in Proc. IEEE ICCV, 2019. [Online]. Available: https://doi.org/10.1109/ICCV.2019.00318
- J. Chen, B. Li, and X. Xue, “Scene text telescope: Text-focused scene image super-resolution,” in Proc. IEEE/CVF CVPR, 2021, pp. 12 026– 12 035. [Online]. Available: https://doi.org/10.1109/CVPR46437.2021. 01185
- J. Chen, H. Yu, J. Ma, B. Li, and X. Xue, “Text Gestalt: Stroke-aware scene text image super-resolution,” in Proc. AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 285–293. [Online]. Available: https://doi.org/10.1609/aaai.v36i1.19904
- J. Ma, S. Guo, and L. Zhang, “Text prior guided scene text image super-resolution,” IEEE Transactions on Image Processing, vol. 32, pp. 1341–1353, 2023. [Online]. Available: https://doi.org/10.1109/TIP.2023.3237002
- T. Frizza, D. G. Dansereau, N. M. Seresht, and M. Bewley, “Semantically accurate super-resolution generative adversarial networks,” Computer Vision and Image Understanding, p. 103464, 2022. [Online]. Available: https://doi.org/10.1016/j.cviu.2022.103464
- M. S. Rad, B. Bozorgtabar, C. Musat, U.-V. Marti, M. Basler, H. K. Ekenel, and J.-P. Thiran, “Benefiting from multitask learning to improve single image super-resolution,” Neurocomputing, vol. 398, pp. 304–313, 2020. [Online]. Available: https://doi.org/10.1016/j.neucom.2019.07.107
- Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in Proc. IEEE/CVF ECCV. Springer, 2016, pp. 56–72. [Online]. Available: https://doi.org/10.1007/978-3-319-46484-8 4
- S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, “Multi-task learning for dense prediction tasks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3614–3633, 2021. [Online]. Available: https://doi.org/10.1109/TPAMI.2021.3054719
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proc. IEEE/CVF ECCV. Springer, 2014, pp. 740–755. [Online]. Available: https://doi.org/10.1007/978-3-319-10602-1 48
- G. Lazzara and T. Géraud, “Efficient multiscale sauvola’s binarization,” International Journal on Document Analysis and Recognition (IJDAR), vol. 17, no. 2, pp. 105–123, 2014. [Online]. Available: https://doi.org/10.1007/s10032-013-0209-0
- R. Gomez, B. Shi, L. Gomez, L. Numann, A. Veit, J. Matas, S. Belongie, and D. Karatzas, “ICDAR2017 robust reading challenge on COCO-text,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1. IEEE, 2017, pp. 1435–1443. [Online]. Available: https://doi.org/10.1109/ICDAR.2017.234
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE/CVF CVPR, 2018. [Online]. Available: https://doi.org/10.1109/CVPR.2018.00068