Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 16

Position Papers of the 2018 Federated Conference on Computer Science and Information Systems

Robotic Process Automation of Unstructured Data with Machine Learning

, , ,

DOI: http://dx.doi.org/10.15439/2018F373

Citation: Position Papers of the 2018 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 16, pages 916 ()

Full text

Abstract. In this paper we present our work in progress on building an artificial intelligence system dedicated to tasks regarding the processing of formal documents used in various kinds of business procedures. The main challenge is to build machine learning (ML) models to improve the quality and efficiency of business processes involving image processing, optical character recognition (OCR), text mining and information extraction. In the paper we introduce the research and application field, some common techniques used in this area and our preliminary results and conclusions.

References

  1. M. Kukreja and A. singh Nervaiya, “Study of robotic process automation (rpa),” International Journal on Recent and innovation trends in computing and communication, vol. 6, pp. 434–437, 2016.
  2. Robotic process automation, “Robotic process automation — Wikipedia, the free encyclopedia,” 2018, [Online; accessed 7-June-2018]. [Online]. Available: https://en.wikipedia.org/wiki/Robotic_process_automation
  3. Capgemini Consulting, “Robotic Process Automation – Robots conquer business processes in back offices,” 2016, accessed: 2018-06-07. [Online]. Available: https://www.capgemini.com/consulting-de/wp-content/uploads/sites/32/2017/08/robotic-process-automation-study.pdf
  4. PWC, “Rethinking retail: Artificial Intelligence and Robotic Process Automation,” 2017, accessed: 2018-06-07. [Online]. Available: https://www.pwc.be/en/documents/20171123-rethinking-retail-artificial-intelligence-and-robotic-process-automation. pdf
  5. David Schatsky and Craig Muraskin and Kaushik Iyengar, “Robotic process automation A path to the cognitive enterprise,” 2016, accessed: 2018-06-07. [Online]. Available: https://www2.deloitte.com/content/dam/insights/us/articles/3451_Signals_Robotic-process-automation/DUP_Signals_Robotic-process-automation.pdf
  6. S. Aguirre and A. Rodriguez, “Automation of a business process using robotic process automation (rpa): A case study,” in Workshop on Engineering Applications. Springer, 2017, pp. 65–71.
  7. H. P. Fung, “Criteria, use cases and effects of information technology process automation (itpa),” 07 2014.
  8. W. K. Pratt, Digital Image Processing: PIKS Inside, 3rd ed. New York, NY, USA: John Wiley & Sons, Inc., 2001.
  9. Matlab official website, “Matlab,” 2018, [Online; accessed 7-June-2018]. [Online]. Available: https://www.mathworks.com/products/matlab.html
  10. OpenCV official website, “Opencv,” 2018, [Online; accessed 7-June-2018]. [Online]. Available: https://opencv.org
  11. S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” CoRR, vol. abs/1506.01497, 2015. [Online]. Available: http://arxiv.org/abs/1506.01497
  12. J. Howard and S. Ruder, “Fine-tuned language models for text classification,” CoRR, vol. abs/1801.06146, 2018. [Online]. Available: http://arxiv.org/abs/1801.06146
  13. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.
  14. ImageNet dataset official website, “Imagenet,” 2018, [Online; accessed 7-June-2018]. [Online]. Available: http://www.image-net.org
  15. COCO dataset official website, “Coco,” 2018, [Online; accessed 7-June-2018]. [Online]. Available: http://cocodataset.org
  16. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014. [Online]. Available: http://arxiv.org/abs/1409.1556
  17. X. Li, “Classification with large sparse datasets: Convergence analysis and scalable algorithms,” 2017.
  18. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “Liblinear: A library for large linear classification,” J. Mach. Learn. Res., vol. 9, pp. 1871–1874, Jun. 2008. [Online]. Available: http://dl.acm.org/citation.cfm?id=1390681.1442794
  19. K.-c. Lee, B. Orten, A. Dasdan, and W. Li, “Estimating conversion rate in display advertising from past performance data,” in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’12. New York, NY, USA: ACM, 2012, pp. 768–776. [Online]. Available: http://doi.acm.org/10.1145/2339530.2339651
  20. H. T. Le, C. Cerisara, and A. Denis, “Do convolutional networks need to be deep for text classification ?” CoRR, vol. abs/1707.04108, 2017. [Online]. Available: http://arxiv.org/abs/1707.04108
  21. Y. Kim, “Convolutional neural networks for sentence classification,” CoRR, vol. abs/1408.5882, 2014. [Online]. Available: http://arxiv.org/abs/1408.5882
  22. T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: ACM, 2016, pp. 785–794. [Online]. Available: http://doi.acm.org/10.1145/2939672.2939785
  23. C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, Sep 1995. [Online]. Available: https://doi.org/10.1007/BF00994018
  24. T. Joachims, “Text categorization with support vector machines: Learning with many relevant features,” in Machine Learning: ECML-98, C. Nédellec and C. Rouveirol, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, pp. 137–142.
  25. T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” CoRR, vol. abs/1708.02709, 2017. [Online]. Available: http://arxiv.org/abs/1708.02709
  26. S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” CoRR, vol. abs/1803.01271, 2018. [Online]. Available: http://arxiv.org/abs/1803.01271
  27. J. Piskorski and R. Yangarber, Information Extraction: Past, Present and Future. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 23–49. [Online]. Available: https://doi.org/10.1007/978-3-642-28569-1_2
  28. J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proceedings of the Eighteenth International Conference on Machine Learning, ser. ICML ’01. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001, pp. 282–289. [Online]. Available: http://dl.acm.org/citation.cfm?id=645530.655813
  29. H. M. Wallach, “Conditional random fields: An introduction,” Technical Reports (CIS), p. 22, 2004.
  30. J. P. C. Chiu and E. Nichols, “Named entity recognition with bidirectional lstm-cnns,” CoRR, vol. abs/1511.08308, 2015. [Online]. Available: http://arxiv.org/abs/1511.08308
  31. Z. Yang, R. Salakhutdinov, and W. W. Cohen, “Multi-task cross-lingual sequence tagging from scratch,” CoRR, vol. abs/1603.06270, 2016. [Online]. Available: http://arxiv.org/abs/1603.06270
  32. F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, Mar. 2002. [Online]. Available: http://doi.acm.org/10.1145/505282.505283
  33. Z. Yang, R. Salakhutdinov, and W. W. Cohen, “Transfer learning for sequence tagging with hierarchical recurrent networks,” CoRR, vol. abs/1703.06345, 2017. [Online]. Available: http://arxiv.org/abs/1703.06345