Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 21

Proceedings of the 2020 Federated Conference on Computer Science and Information Systems

Simulator of a Supercomputer Job Management System as a Scientific Service

, , , ,

DOI: http://dx.doi.org/10.15439/2020F208

Citation: Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 21, pages 413416 ()

Full text

Abstract. Job management system (JMS) is an important part of any supercomputer. JMS creates a schedule for launching jobs of different users. Actual job management systems are complex software systems with a number of settings. These settings have a significant impact on various JMS metrics, such as supercomputer resources utilization, mean waiting time of a job in queue, and others. Various JMS simulators are widely used to study the influence of JMS settings or modifications, new scheduling algorithms, jobs input stream parameters or available computing resources for JMS efficiency metrics. The article presents the comparative analysis results of the actual JMS simulators (Alea, ScSF, Batsim, AccaSim, Slurm simulator) and their application areas. The authors consider new ways to use the JMS simulator as a scientific service for researchers. With such a service, the researchers are able to study various hypotheses about JMS efficiency, algorithms or parameters. This gives the folowing: (1) research is performed on the service side around the clock, (2) the simulator accuracy or adequacy is provided by the service, (3) the research results reproducibility is ensured, and the simulator-as-a-service becomes a single entry point for the researchers.

References

  1. A. Reuther et al. “Scalable system scheduling for HPC and big data,” J. of Parallel and Distributed Computing, vol. 111, 2018, pp. 76–92. https://dx.doi.org/10.1016/j.jpdc.2017.06.009
  2. A. V. Baranov, E. A. Kiselev, D. S. Lyakhovets, “The quasi scheduler for utilization of multiprocessing computing system’s idle resources under control of the Management System of the Parallel Jobs,” Bul. of the South Ural State University. Series Comp. Math. and Software Engineering, issue 3(4), 2014, pp. 75–84 (in Russian). https://dx.doi.org/10.14529/cmse140405
  3. B. Shabanov, A. Ovsiannikov, A. Baranov, S. Leshchev, B. Dolgov, and D. Derbyshev, “The distributed network of the supercomputer centers for collaborative research,” in Program systems: Theory and applications, 8:4(35), 2017, pp. 245–262 (In Russian). https://dx.doi.org/10.25209/2079-3316-2017-8-4-245-262
  4. N. N. Kuzyurin, D. A. Grushin, and S. A. Fomin, “Two-dimensional packing problems and optimization in distributed computing systems,” in Proc.of the Institute for System Programming of the RAS, vol. 26, no 1, 2014, pp. 483–502 (in Russian).
  5. A. I. Tikhomirov, “The English Auction Method for Scheduling Jobs in a Distributed Network of Supercomputer Centers,” Lobachevskii J. of Math., vol. 40, issue 5, 2019, pp. 606–613. https://dx.doi.org/10.1134/s1995080219050214
  6. A. B. Yoo, M. A. Jette, and M. Grondona, “SLURM: Simple Linux Utility for Resource Management,” Lecture Notes in Comp. Science, vol. 2862, 2003, pp. 44–60. https://dx.doi.org/10.1007/10968987_3
  7. R. L. Henderson, “Job scheduling under the Portable Batch System,” Lecture Notes in Comp. Science, vol. 949, 1995, pp. 279–294. https://dx.doi.org/10.1007/3-540-60153-8_34
  8. IBM Spectrum LSF overview, https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_foundations/chap_lsf_overview_foundations.html
  9. G. I. Savin, B. M. Shabanov, P. N. Telegin, and A. V. Baranov, “Joint Supercomputer Center of the Russian Academy of Sciences: Present and Future,” Lobachevskii J. of Mathematics, vol. 40, issue 11, 2019, pp. 1853–1862. https://dx.doi.org/10.1134/S1995080219110271
  10. N. Capit et al., “A batch scheduler with high level components,” in IEEE Int. Symp. on Cluster Comp. and the Grid, Cardiff, Wales, UK, vol. 2, 2005, pp. 776–783. https://dx.doi.org/10.1109/CCGRID.2005.1558641
  11. M. C. Cera et al., “Supporting Malleability in Parallel Architectures with Dynamic CPUSETs Mapping and Dynamic MPI,” in Distributed Computing and Networking, 2010, pp. 242–257. https://dx.doi.org/10.1007/978-3-642-11322-2_26
  12. I. M. Yakimov, M. V. Trusfus, V. V. Mokshin, and A. P. Kirpichnikov, “AnyLogic, ExtendSim and Simulink Overview Comparison of Structural and Simulation modelling Systems,” in Proc. 3rd Russian-Pacific Conf. on Computer Technology and Applications (RPC), Vladivostok, 2018, pp. 1–5. https://dx.doi.org/10.1109/RPC.2018.8482152
  13. S. W. Cox, “GPSS World: A brief preview,” in 1991 Winter Simulation Conference Proceedings, Phoenix, AZ, USA, 1991, pp. 59–61. https://dx.doi.org/10.1109/WSC.1991.185591
  14. A. Legrand, M. Quinson, H. Casanova, and K. Fujiwara, “The SIMGRID Project Simulation and Deployment of Distributed Applications,” in 15th IEEE Int. Conf. on High Performance Distributed Computing, Paris, 2006, pp. 385–386. https://dx.doi.org/10.1109/HPDC.2006.1652196
  15. S. R. Chelladurai, “Gridsim: a flexible simulator for grid integration study,” 2017. https://dx.doi.org/10.24124/2017/1375
  16. I. C. Legrand and H. B. Newman, “The MONARC toolset for simulating large network-distributed processing systems,” in Winter Simulation Conf. Proc. (Cat. No.00CH37165), Orlando, FL, USA, vol.2, 2000, pp. 1794–1801. https://dx.doi.org/10.1109/WSC.2000.899171
  17. D. Klusacek, H. Rudova, “Alea 2: job scheduling simulator,” in SIMUTOOLS ICST, 2010. https://dx.doi.org/10.4108/ICST.SIMUTOOLS2010.8722
  18. W. H. Bell, D. G. Cameron, F. P. Millar, L. Capozza, K. Stockinger, and F. Zini, “Optorsim: A Grid Simulator for Studying Dynamic Data Replication Strategies,” The Int. J. of High Performance Computing Applications, 17(4), 2003, pp. 403–416. https://dx.doi.org/10.1177/10943420030174005
  19. W. Chen and E. Deelman, “WorkflowSim: A toolkit for simulating scientific workflows in distributed environments,” in 2012 IEEE 8th Int. Conf. on E-Science, Chicago, IL, 2012, pp. 1–8. https://dx.doi.org/10.1109/eScience.2012.6404430
  20. J. Taheri, A. Zomaya, S. Khan, “Grid Simulation Tools for Job Scheduling and Data File Replication,” in Scalable Computing and Communications: Theory and Practice, New Jersey: Wiley, 2013, pp. 777–797.
  21. P. F. Dutot, M. Mercier, M. Poquet, O. Richard, “Batsim: a realistic language-independent resources and jobs management systems simulator,” in Job Scheduling Strategies for Parallel Processing, 2015, pp. 178–197. https://dx.doi.org/10.1007/978-3-319-61756-5_10
  22. G. P. Rodrigo, E. Elmroth, P. Ostberg, L. Ramakrishnan, “ScSF: A Scheduling Simulation Framework,” Lecture Notes in Comp. Science, vol. 10773, 2017. https://dx.doi.org/10.1007/978-3-319-77398-8_9
  23. N. A. Simakov et al., “A Slurm Simulator: Implementation and Parametric Analysis,” Lecture Notes in Comp. Science, vol. 10724, 2017. https://dx.doi.org/10.1007/978-3-319-72971-8_10
  24. D. Klusacek, M. Soysal, F. Suter, “Alea — Complex Job Scheduling Simulator,” Lecture Notes in Comp. Science, vol. 12044, 2019. https://dx.doi.org/10.1007/978-3-030-43222-5_19
  25. C. Galleguillos, Z. Kiziltan, A. Netti et al., “AccaSim: a customizable workload management simulator for job dispatching research in HPC systems,” in Cluster Comput. vol. 23, 2020, pp. 107–122. https://dx.doi.org/10.1007/s10586-019-02905-5
  26. A. Baranov, P. Telegin, B. Shabanov, D. Lyakhovets, “Measure of Adequacy for the Supercomputer Job Management System Model,” in Proc. of the 2019 Fed. Conf. on Computer Science and Information Systems, ACSIS, vol. 18, 2019, pp. 423–426. https://dx.doi.org/10.15439/2019F186