A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache

Artur Malinowski, Paweł Czarnul, Piotr Dorożyński, Krzysztof Czuryło, Łukasz Dorau, Maciej Maciejewski, Paweł Skowron

DOI: http://dx.doi.org/10.15439/2016F52

Citation: Position Papers of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 9, pages 133–140 (2016)

Full text

Abstract. While many scientific, large-scale applications are data-intensive, fast and efficient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM). The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access. Another advantage of the proposed solution is making development of a parallel application easy and efficient as a programmer just needs to use the well known MPI I/O data model and API while efficient file access is automatically provided without a need for application level optimizations like avoiding frequent operations on a small data. Results of experiments obtained with three different applications suggest, that the extension significantly reduces file access time, especially for small I/O operations. By locating cache facilities on computing nodes, the extension decreases load of file system servers and makes I/O scalable.

References

Message Passing Interface Forum, “MPI: A Message-Passing Interface Standard Version 3.1,” June 2015, http://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf.
W. Gropp, T. Hoefler, R. Thakur, and E. Lusk, Using Advanced MPI: Modern Features of the Message-Passing Interface (Scientific and Engineering Computation). The MIT Press, 2014, ISBN 978-0262527637.
P. Wautelet, “Best practices for parallel IO and MPI-IO hints,” March 2015, http://www.idris.fr/media/docs/docu/idris/idris_patc_hints_proj.pdf.
B. Hadri, “Introduction to Parallel I/O,” October 2011, https://www.olcf.ornl.gov/wp-content/uploads/2011/10/Fall_IO.pdf.
N. H.-E. C. Program, “Lustre Best Practices,” August 2015, http://www.nas.nasa.gov/hecc/support/kb/lustre-best-practices_226.html.
R. Thakur, W. Gropp, and E. Lusk, “Data sieving and collective I/O in romio,” Frontiers ’99 - Seventh Symposium On Frontiers Massively Parallel Computation, Proc., pp. 182–189, 1999. http://dx.doi.org/10.1109/FMPC.1999.750599.
Y. Tsujita, K. Yoshinaga, A. Hori, M. Sato, M. Namiki, and Y. Ishikawa, “Multithreaded Two-Phase I/O: Improving Collective MPI-IO Performance on a Lustre File system,” 2014 22nd Euromicro Int. Conference On Parallel, Distributed, Network-based Processing (pdp 2014), pp. 232–235, 2014. http://dx.doi.org/10.1109/PDP.2014.46.
A. Hori, K. Yamamoto, and Y. Ishikawa, “Catwalk-ROMIO: A Cost-Effective MPI-IO,” 2011 IEEE 17th Int. Conference On Parallel Distributed Systems (icpads), pp. 120–126, 2011. http://dx.doi.org/10.1109/ICPADS.2011.40.
F. Wang, Y. Chen, S. Li, F. Yang, and B. Xiao, “The design of data storage system based on lustre for {EAST},” Fusion Engineering and Design, 2016. http://dx.doi.org/10.1016/j.fusengdes.2016.04.002.
S. A. Wright, S. D. Hammond, S. J. Pennycook, I. Miller, J. A. Herdman, and S. A. Jarvis, “Ldplfs: Improving I/O Performance Without Application modification,” 2012 IEEE 26th Int. Parallel Distributed Processing Symposium Workshops & Phd Forum (ipdpsw), pp. 1352–1359, 2012. http://dx.doi.org/10.1109/IPDPSW.2012.172.
M. D. Dahlin, R. Y. Wang, T. E. Anderson, and D. A. Patterson, “Cooperative caching: Using remote client memory to improve file system performance,” in Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation, ser. OSDI ’94, 1994. http://dl.acm.org/citation.cfm?id=1267638.1267657
A. Teperman and A. Weit, “Improving Performance of Distributed File System Using OSDs and Cooperative Cache,” IBM Haifa Labs, 2004.
U. Karnani, R. Kalmady, P. Chand, A. Bhattacharjee, and B. S. Jagadeesh, “Design and Implementation of a Novel Distributed Memory File System,” ser. Communications in Computer and Information Science, vol. 133, no. III, 2011. http://dx.doi.org/10.1007/978-3-642-17881-8_14 pp. 139–148, 1st International Conference on Computer Science and Information Technology, 2011, India.
F. Isailå, J. G. Blas, J. Carretero, W.-k. Liao, and A. Choudhary, “AHPIOS: An MPI-Based Ad Hoc Parallel I/O System,” in Parallel and Distributed Systems, 2008. ICPADS’08. 14th IEEE International Conference on. IEEE, 2008. http://dx.doi.org/10.1109/ICPADS.2008.50 pp. 253–260. http://dx.doi.org/10.1109/ICPADS.2008.50
W.-K. Liao, K. Coloma, A. Choudhary, and L. Ward, “Cooperative Client-Side File Caching for MPI Applications,” Int. J. High Perform. Comput. Appl., vol. 21, no. 2, pp. 144–154, May 2007. http://dx.doi.org/10.1177/1094342007077857.
P. Czarnul and M. Frączak, Recent Advances in Parallel Virtual Machine and Message Passing Interface: 12th European PVM/MPI Users’ Group Meeting Sorrento, Italy, September 18-21, 2005. Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, ch. New User-Guided and ckpt-Based Checkpointing Libraries for Parallel MPI Applications„ pp. 351–358. ISBN 978-3-540-31943-6. http://dx.doi.org/10.1007/11557265_46
P. Dorożyński, P. Czarnul, A. Malinowski, K. Czuryło, Ł. Dorau, M. Maciejewski, and P. Skowron, “Checkpointing of Parallel MPI Applications using MPI One-sided API with Support for Byte- addressable Non-volatile RAM,” Procedia Computer Science, vol. 80, pp. 30 – 40, 2016. http://dx.doi.org/10.1016/j.procs.2016.05.295 International Conference on Computational Science 2016, June 2016, USA.
R. Rajachandrasekar, A. Moody, K. Mohror, and D. Panda, “A 1PB/s File System to Checkpoint Three Million MPI Tasks,” June 2013.
M. H. Kryder and C. S. Kim, “After Hard Drives — What Comes Next?” Magnetics, IEEE Transactions on, vol. 45, no. 10, pp. 3406–3413, Oct 2009. http://dx.doi.org/10.1109/TMAG.2009.2024163.
Intel Corporation, “Intel and Micron Produce Breakthrough Memory Technology,” July 2015, http://newsroom.intel.com/community/intel_newsroom/blog/2015/07/28/intel-and-micron-produce-breakthrough-memory-technology.
Intel Corporation, “3D XPoint Technology Revolutionizes Storage Memory,” July 2015, http://www.intel.com/content/www/us/en/architecture-and-technology/3d-xpoint-technology-animation.html.
Intel Corporation, “Introducing Breakthrough Memory Technology,” July 2015, http://www.intel.com/content/www/us/en/architecture-and-technology/non-volatile-memory.html.
S. He, X.-H. Sun, and B. Feng, “S4d-cache: Smart Selective SSD Cache for Parallel I/O systems,” 2014 Ieee 34th Int. Conference On Distributed Computing Systems (icdcs 2014), pp. 514–523, 2014. http://dx.doi.org/10.1109/ICDCS.2014.59.
S. He, Y. Wang, and X.-H. Sun, “Improving Performance of Parallel I/O Systems through Selective and Layout-Aware SSD Cache,” IEEE Transactions on Parallel and Distributed Systems, 2016. http://dx.doi.org/10.1109/TPDS.2016.2521363.
D. Li, J. S. Vetter, G. Marin, C. McCurdy, C. Cira, Z. Liu, and W. Yu, “Identifying Opportunities for Byte-Addressable Non- Volatile Memory in Extreme-ScaleScientific applications,” 2012 Ieee 26th Int. Parallel Distributed Processing Symposium (ipdps), pp. 945–956, 2012. http://dx.doi.org/10.1109/IPDPS.2012.89.
B. V. Essen, R. Pearce, S. Ames, and M. Gokhale, “On the role of NVRAM in data-intensive architectures: an evaluation,” 2012 Ieee 26th Int. Parallel Distributed Processing Symposium (ipdps), pp. 703–714, 2012. http://dx.doi.org/10.1109/IPDPS.2012.69.
S. Kannan, A. Gavrilovska, K. Schwan, D. Milojicic, and V. Talwar, “Using Active NVRAM for I/O Staging,” in Proceedings of the 2Nd International Workshop on Petascal Data Analytics: Challenges and Opportunities, ser. PDAC ’11. ACM, 2011. http://dx.doi.org/10.1145/2110205.2110209. ISBN 978-1-4503-1130-4 pp. 15–22.
S. Kannan, D. Milojicic, V. Talwar, A. Gavrilovska, K. Schwan, and H. Abbasi, “Using Active NVRAM for Cloud I/O,” in Proceedings of the 2011 Sixth Open Cirrus Summit, ser. OCS ’11. IEEE Computer Society, 2011. http://dx.doi.org/10.1109/OCS.2011.12. ISBN 978-0-7695-4650-6 pp. 32–36.
NVM Library team at Intel Corporation, led by Andy Rudoff, “pmem.io Persistent Memory Programming,” http://pmem.io/nvml/libpmem/.
J. M. Kunkel and T. Ludwig, “Performance evaluation of the PVFS2 architecture,” 15th Euromicro International Conference On Parallel, Distributed And Network-based Processing, Proceedings, pp. 509–516, 2007.
P. Czarnul, A. Ciereszko, and M. Frączak, “Towards efficient parallel image processing on cluster grids using gimp,” in Computational Science - ICCS 2004, ser. Lecture Notes in Computer Science, M. Bubak, G. van Albada, P. Sloot, and J. Dongarra, Eds. Springer Berlin Heidelberg, 2004, vol. 3037, pp. 451–458. ISBN 978-3-540-22115-9. http://dx.doi.org/10.1007/978-3-540-24687-9_57