Data Structures for Markov Chain Transition Matrices on Intel Xeon Phi

Beata Bylina, Joanna Potiopa

DOI: http://dx.doi.org/10.15439/2016F345

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 665–668 (2016)

Full text

Abstract. We employ Intel Xeon Phi as a high-performance coprocessor to solve Markov chains. Matrices arising from Markov models are very sparse with short rows. In this paper the authors research two storage formats of Markov chain transition matrices on Intel Xeon Phi. In this work are studies CSR and HYB (modification ELL) format for such matrices. Using these formats, sparse matrix-vector multiplication (SpMV) operation is implemented and then is employed to the explicit fourth- order Runge-Kutta method. Numerical experiments results for transition matrices of Markov chains from wireless networks and call-center models show that HYB format in offload version is more effective than CSR format. The obtained performance for HYB format is even 1.45 times better in comparison to multi- threaded CPU (dual Intel Xeon E5-2670) with the use of the CSR format (SpMV from the MKL library on CPU).

References

ELLPACK, 2014. http://www.cs.purdue.edu/ellpack.
N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. Technical report, NVIDIA, 2008. Tech. Report No. NVR-2008-004.
G. Bianchi. Performance analysis of the IEEE 802.11 distributed coordination function. IEEE Journal on Selected Areas in Communications, 18(3):535–547, March 2000.
B. Bylina, J. Bylina, and M. Karwacki. Computational aspects of GPU-accelerated sparse matrix-vector multiplication for solving Markov models. Theoretical and Applied Informatics, 23(2):127–145, 2011.
J. Bylina and B. Bylina. A Markovian queuing model of a WLAN node. Communications in Computer and Information Science, 160:80–86, 2011.
J. Bylina, B. Bylina, and M. Karwacki. A Markovian model of a network of two wireless devices. Communications in Computer and Information Science, 291:411–420, 2012.
Jarosław Bylina, Beata Bylina, and Marek Karwacki. An efficient representation on GPU for transition rate matrices for Markov chains. In Parallel Processing and Applied Mathematics, pages 663–672. Springer, 2013.
Tim Cramer, Dirk Schmidl, Michael Klemm, and Dieter an Mey. OpenMP programming on Intel Xeon Phi coprocessors: An early performance comparison. In Proc. of the Many-core Applications Research Community Symposium at RWTH Aachen University, pages 38–44, 2012.
N. Gans, G. Koole, and A. Mandelbaum. Telephone call centers: tutorial. Review and Research Prospects, Manufact. and Service Oper. Manag., 5:79–141, 2003.
N. A. Hamilton, K. Burrage, and A. Bustamam. Fast parallel markov clustering in bioinformatics using massively parallel computing on gpu with cuda and ellpack-r sparse format. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(3):679–692, 2012.
Intel. Intel Math Kernel Library (MKL). http://software.intel.com/en-us/intel-mkl, 2014.
Xing Liu, Mikhail Smelyanskiy, Edmond Chow, and Pradeep Dubey. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS ’13, pages 273–282, New York, NY, USA, 2013. ACM.
Erik Saule, Kamer Kaya, and Ümit V. Çatalyürek. Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. CoRR, abs/1302.1078, 2013.
W. J. Stewart. An Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Princeton, NJ, 1994.