Business Intelligence Platform for Big Data based on Scalable Distributed Two-Layer Data Store

Adam Krechowicz; Stanisław Deniziak

Business Intelligence Platform for Big Data based on Scalable Distributed Two-Layer Data Store

Adam Krechowicz, Stanisław Deniziak

DOI: http://dx.doi.org/10.15439/2017F195

Citation: Communication Papers of the 2017 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 13, pages 177–182 (2017)

Full text

Abstract. Data mining is one of the main business intelligence technique. The volume of Big Data expands in such a way that the classical Business Intelligence methods need to be redefined. First, the organization of large data sets requires new distributed database architectures. Second, it is necessary to develop distributed data processing models that provide a high degree of scalability.In this paper we introduce fully scalable BI platform that is suitable for the most common data processing issues. The platform is based on our scalable distributed two-layer data store, which is competitive to existing NoSQL distributed data base systems. We show examples and experimental results showing advantages of our approach.

References

A. Krechowicz, S. Deniziak, M. Bedla, A. Chrobot, and G. Łukawski, “Scalable distributed two-layer block based datastore,” in International Conference on Parallel Processing and Applied Mathematics. Springer International Publishing, 2015, pp. 302–311.
M. García and B. Harmsen, Qlikview 11 for developers. Packt Publishing Ltd, 2012.
Oracle, “Oracle business intelligence 12c,” https://www.oracle.com/ solutions/business-analytics/business-intelligence/index.html, 2017, accessed 3rd April 2017.
SAP, “Business intelligence (bi) tools & software,” https://www.sap.com/solution/platform-technology/analytics/business-intelligence-bi.html, 2017, accessed 3rd April 2017.
IBM, “Business intelligence,” https://www.ibm.com/business-intelligence, 2017, accessed 3rd April 2017.
P. Zikopoulos, C. Eaton et al., Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, 2011.
J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
D. Borthakur, J. Gray, J. S. Sarma, K. Muthukkaruppan, N. Spiegelberg, H. Kuang, K. Ranganathan, D. Molkov, A. Menon, S. Rash et al., “Apache hadoop goes realtime at facebook,” in Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 2011, pp. 1071–1080.
S. Chen, “Cheetah: a high performance, custom data warehouse on top of mapreduce,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 1459–1468, 2010.
J. Dittrich and J.-A. Quiané-Ruiz, “Efficient big data processing in hadoop mapreduce,” Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2014–2015, 2012.
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, “Hive: a warehousing solution over a map- reduce framework,” Proceedings of the VLDB Endowment, vol. 2, no. 2, pp. 1626–1629, 2009.
A. Moniruzzaman and S. A. Hossain, “Nosql database: New era of databases for big data analytics-classification, characteristics and com- parison,” arXiv preprint https://arxiv.org/abs/1307.0191, 2013.
C. J. Tauro, S. Aravindh, and A. Shreeharsha, “Comparative study of the new generation, agile, scalable, high performance nosql databases,” International Journal of Computer Applications, vol. 48, no. 20, pp. 1–4, 2012.
W. Litwin, M.-A. Neimat, and D. A. Schneider, LH*: Linear Hashing for distributed files. ACM, 1993, vol. 22, no. 2.
K. Sapiecha and G. Lukawski, “Scalable distributed two-layer data structures (SD2DS),” International Journal of Distributed Systems and Technologies (IJDST), vol. 4, no. 2, pp. 15–30, 2013.
K. Sapiecha, G. Łukawski, and A. Krechowicz, “Enhancing throughput of scalable distributed two–layer data structures,” in Parallel and Dis- tributed Computing (ISPDC), 2014 IEEE 13th International Symposium on. IEEE, 2014, pp. 103–110.
E. Plugge, T. Hawkins, and P. Membrey, The Definitive Guide to Mon- goDB: The NoSQL Database for Cloud and Desktop Computing, 1st ed. Berkely, CA, USA: Apress, 2010. ISBN 1430230517, 9781430230519
J. Petrovic, “Using memcached for data distribution in industrial envi- ronment,” in Systems, 2008. ICONS 08. Third International Conference on. IEEE, 2008, pp. 368–372.
A. Krechowicz, A. Chrobot, S. Deniziak, and G. Łukawski, “SD2DS- based datastore for large files,” in Federated Conference on Software Development and Object Technologies. Springer, 2015, pp. 150–168.
Memcached, “Memcached – A Distributed Memory Object Caching System,” http://memcached.org, accessed 3rd April 2017.
S. Deniziak, T. Michno, and A. Krechowicz, “The scalable distributed two-layer content based image retrieval data store,” in Computer Science and Information Systems (FedCSIS), 2015 Federated Conference on. IEEE, 2015, pp. 827–832.
S. Deniziak, T. Michno, and A. Krechowicz, “Content based image retrieval using modified scalable distributed two-layer data structure.” International Journal of Computer Science & Applications, vol. 13, no. 2, 2016.
T. Michno and A. Krechowicz, “SD2DS database in the direction of image retrieval,” Applications of information technologies - theory and practice, vol. 11, 2015.
S. Deniziak, G. Łukawski, M. Bedla, and A. Krechowicz, “A scalable distributed 2-layered data store (SD2DS) for internet of things (IoT) systems,” Measurement Automation Monitoring, vol. 61, no. 7, pp. 382– 384, 2015.
A. Krechowicz, “Scalable distributed two-layer datastore providing data anonymity,” in International Conference: Beyond Databases, Architec- tures and Structures. Springer, 2015, pp. 262–271.
A. Krechowicz and S. Deniziak, “SD2DS-based anonymous datastore for IoT solutions,” DEStech Transactions on Computer Science and Engineering, no. wcne, 2016.
Narodowy Bank Polski, “Statistics and reporting,” http://www.nbp.pl/ home.aspx?f=/statystyka/kursy.html, accessed 3rd April 2017.