Whose Fault is It? Correctly Attributing Outages in Cloud Services
Maurizio Naldi, Matteo Adriani
DOI: http://dx.doi.org/10.15439/2019F59
Citation: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 18, pages 433–440 (2019)
Abstract. Cloud availability is a major performance parameter in cloud Service Level Agreements (SLA). Its correct evaluation is essential to SLA enforcement and possible litigation issues. Current methods fail to correctly identify the fault location, since they include the network contribution. We propose a procedure to identify the failures actually due to the cloud itself and provide a correct cloud availability measure. The procedure employs tools that are freely available, i.e. traceroute and whois, and arrives at the availability measure by first identifying the boundaries of the cloud. We evaluate our procedure by testing it on three major cloud providers: Google Cloud, Amazon AWS, and Rackspace. The results show that the procedure arrives at a correct identification in 95\% of cases. The cloud availability obtained in the test after correct identification lies between 3 and 4 nines for the three platforms under test.
References
- M. M. Qiu, Y. Zhou, and C. Wang, “Systematic analysis of public cloud service level agreements and related business values,” in Services Computing (SCC), 2013 IEEE International Conference on. Santa Clara, CA, USA: IEEE, 2013, pp. 729–736.
- S. A. Baset, “Cloud SLAs: present and future,” ACM SIGOPS Operating Systems Review, vol. 46, no. 2, pp. 57–66, 2012.
- M. Alhamad, T. Dillon, and E. Chang, “Conceptual sla framework for cloud computing,” in Digital Ecosystems and Technologies (DEST), 2010 4th IEEE International Conference on. Dubai, United Arab Emirates: IEEE, 2010, pp. 606–610.
- B. Varghese and R. Buyya, “Next generation cloud computing: New trends and research directions,” Future Generation Computer Systems, vol. 79, pp. 849–861, 2018.
- R. Buyya, S. N. Srirama, G. Casale, R. Calheiros, Y. Simmhan, B. Varghese, E. Gelenbe, B. Javadi, L. M. Vaquero, M. A. Netto et al., “A manifesto for future generation cloud computing: Research directions for the next decade,” ACM Computing Surveys (CSUR), vol. 51, no. 5, p. 105, 2018.
- M. Cinque, S. Russo, C. Esposito, K.-K. R. Choo, F. Free-Nelson, and C. A. Kamhoua, “Cloud reliability: Possible sources of security and legal issues?” IEEE Cloud Computing, vol. 5, no. 3, pp. 31–38, 2018.
- L. Fiondella, S. S. Gokhale, and V. B. Mendiratta, “Cloud incident data: An empirical analysis,” in Cloud Engineering (IC2E), 2013 IEEE International Conference on. San Francisco, California, USA: IEEE, 2013, pp. 241–249.
- P. T. Endo, G. L. Santos, D. Rosendo, D. M. Gomes, A. Moreira, J. Kelner, D. Sadok, G. E. Gonçalves, and M. Mahloo, “Minimizing and managing cloud failures,” Computer, vol. 50, no. 11, pp. 86–90, 2017.
- R. Nachiappan, B. Javadi, R. N. Calheiros, and K. M. Matawie, “Cloud storage reliability for big data applications: A state of the art survey,” Journal of Network and Computer Applications, vol. 97, pp. 35–47, 2017.
- M. R. Mesbahi, A. M. Rahmani, and M. Hosseinzadeh, “Highly reliable architecture using the 80/20 rule in cloud computing datacenters,” Future Generation Computer Systems, vol. 77, pp. 77–86, 2017.
- B. Liu, X. Chang, Z. Han, K. Trivedi, and R. J. Rodrı́guez, “Model-based sensitivity analysis of iaas cloud availability,” Future Generation Computer Systems, vol. 83, pp. 1–13, 2018.
- H. Adamu, B. Mohammed, A. B. Maina, A. Cullen, H. Ugail, and I. Awan, “An approach to failure prediction in a cloud based environment,” in Future Internet of Things and Cloud (FiCloud), 2017 IEEE 5th International Conference on. Prague, Czech Republic: IEEE, 2017, pp. 191–197.
- Q. Lin, K. Hsieh, Y. Dang, H. Zhang, K. Sui, Y. Xu, J.-G. Lou, C. Li, Y. Wu, R. Yao et al., “Predicting node failure in cloud service systems,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Lake Buena Vista, Florida: ACM, 2018, pp. 480–490.
- T. Labidi, A. Mtibaa, W. Gaaloul, S. Tata, and F. Gargouri, “Cloud sla modeling and monitoring,” in Services Computing (SCC), 2017 IEEE International Conference on. Honolulu, HI, USA: IEEE, 2017, pp. 338–345.
- F. Nawaz, O. K. Hussain, N. Janjua, and E. Chang, “A proactive event-driven approach for dynamic qos compliance in cloud of things,” in Proceedings of the International Conference on Web Intelligence. Leipzig, Germany: ACM, 2017, pp. 971–975.
- S. Alboghdady, S. Winter, A. Taha, H. Zhang, and N. Suri, “C’mon: Monitoring the compliance of cloud services to contracted properties,” in Proceedings of the 12th International Conference on Availability, Reliability and Security. Reggio Calabria, Italy: ACM, 2017, p. 36.
- H. J. Syed, A. Gani, R. W. Ahmad, M. K. Khan, and A. I. A. Ahmed, “Cloud monitoring: A review, taxonomy, and open research issues,” Journal of Network and Computer Applications, 2017.
- M. Naldi, “The availability of cloud-based services: Is it living up to its promise?” in 9th International Conference on the Design of Reliable Communication Networks, DRCN 2013, Budapest, Hungary, 2013, pp. 282–289.
- J. Dunne and D. Malone, “Obscured by the cloud: A resource allocation framework to model cloud outage events,” Journal of Systems and Software, vol. 131, pp. 218–229, 2017.
- M. Naldi, “Accuracy of third-party cloud availability estimation through ICMP,” in Telecommunications and Signal Processing (TSP), 2016 39th International Conference on. Vienna, Austria: IEEE, 2016, pp. 40–43.
- M. Naldi, “ICMP-based third-party estimation of cloud availability,” International Journal of Advances in Telecommunications, Electrotechnics, Signals and Systems, vol. 6, no. 1, pp. 11–18, 2017.
- Z. Hu, L. Zhu, C. Ardi, E. Katz-Bassett, H. V. Madhyastha, J. Heidemann, and M. Yu, “The need for end-to-end evaluation of cloud availability,” in International Conference on Passive and Active Network Measurement. Springer, 2014, pp. 119–130.
- Y. Jadeja and K. Modi, “Cloud computing-concepts, architecture and challenges,” in Computing, Electronics and Electrical Technologies (ICCEET), 2012 International Conference on. IEEE, 2012, pp. 877–880.
- M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis, and A. Vakali, “Cloud computing: Distributed internet computing for it and scientific research,” IEEE Internet computing, vol. 13, no. 5, 2009.
- W. K. Hon and C. Millard, “Banking in the cloud: Part 3–contractual issues,” Computer Law & Security Review, vol. 34, no. 3, pp. 595–614, 2018.
- S. B. Rahi, S. Bisui, and S. C. Misra, “Identifying critical challenges in the adoption of cloud-based services,” International Journal of Communication Systems, vol. 30, no. 12, p. e3261, 2017.
- AA.VV., “Cost of data center outages,” The Ponemon Institute, Tech. Rep., 2016.
- G. Hogben and A. Pannetrat, “Mutant apples: a critical examination of cloud SLA availability definitions,” in Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on, vol. 1. Bristol, United Kingdom: IEEE, 2013, pp. 379–386.
- K. R. Fall and W. R. Stevens, TCP/IP illustrated, volume 1: The protocols. addison-Wesley, 2011.
- G. F. Lyon, Nmap network scanning: The official Nmap project guide to network discovery and security scanning. Insecure, 2009.
- F. Viger, B. Augustin, X. Cuvellier, C. Magnien, M. Latapy, T. Friedman, and R. Teixeira, “Detection, understanding, and prevention of traceroute measurement artifacts,” Computer networks, vol. 52, no. 5, pp. 998–1018, 2008.
- B. Yao, R. Viswanathan, F. Chang, and D. Waddington, “Topology inference in the presence of anonymous routers,” in INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies, vol. 1. San Francisco California, USA: IEEE, 2003, pp. 353–363.
- R. Govindan and H. Tangmunarunkit, “Heuristics for internet map discovery,” in INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, vol. 3. Tel Aviv, Israel: IEEE, 2000, pp. 1371–1380.
- J. Sommers, P. Barford, and B. Eriksson, “On the prevalence and characteristics of mpls deployments in the open internet,” in Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. Berlin, Germany: ACM, 2011, pp. 445–462.
- B. Donnet, M. Luckie, P. Mérindol, and J.-J. Pansiot, “Revealing mpls tunnels obscured from traceroute,” ACM SIGCOMM Computer Communication Review, vol. 42, no. 2, pp. 87–93, 2012.
- S. Savage et al., “Sting: A TCP-based Network Measurement Tool.” in USENIX Symposium on Internet Technologies and Systems, vol. 2, Boulder, Colorado, USA, 1999, pp. 7–7.
- M. Luckie, Y. Hyun, and B. Huffaker, “Traceroute probe method and forward ip path inference,” in Proceedings of the 8th ACM SIGCOMM conference on Internet measurement. Vouliagmeni, Greece: ACM, 2008, pp. 311–324.
- R. Motamedi, R. Rejaie, and W. Willinger, “A survey of techniques for internet topology discovery,” IEEE Communications Surveys & Tutorials, vol. 17, no. 2, pp. 1044–1065, 2015.
- W. De Donato, P. Marchetta, and A. Pescapé, “A hands-on look at active probing using the ip prespecified timestamp option,” in International Conference on Passive and Active Network Measurement. Vienna, Austria: Springer, 2012, pp. 189–199.
- Z. M. Mao, J. Rexford, J. Wang, and R. H. Katz, “Towards an accurate as-level traceroute tool,” in Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications. Karlsruhe, Germany: ACM, 2003, pp. 365–378.
- M. Luckie, A. Dhamdhere, B. Huffaker, D. Clark et al., “bdrmap: inference of borders between ip networks,” in Proceedings of the 2016 Internet Measurement Conference. Santa Monica, California, USA: ACM, 2016, pp. 381–396.
- M. Naldi and L. Mastroeni, “Cloud storage pricing: A comparison of current practices,” in Proceedings of the 2013 International Workshop on Hot Topics in Cloud Services, ser. HotTopiCS ’13. New York, NY, USA: ACM, 2013, pp. 27–34.
- A. Agapi, K. Birman, R. M. Broberg, C. Cotton, T. Kielmann, M. Millnert, R. Payne, R. Surton, and R. Van Renesse, “Routers for the cloud: Can the internet achieve 5-nines availability?” IEEE Internet Computing, vol. 15, no. 5, p. 72, 2011.