Diagnosing Machine Learning Problems in Federated Learning Systems: A Case Study
Karolina Bogacka, Anastasiya Danilenka, Katarzyna Wasielewska-Michniewska
DOI: http://dx.doi.org/10.15439/2023F722
Citation: Proceedings of the 18th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 35, pages 871–876 (2023)
Abstract. The proliferation of digital artifacts with various computing capabilities, along with the emergence of edge computing, offers new possibilities for the development of Machine Learning solutions. These new possibilities have led to the popularity of Federated Learning (FL). While there are many existing works focusing on various aspects of the FL process, the issue of the effective problem diagnosis in FL systems remains largely unexplored. In this work, we have set out to artificially simulate the training process of four selected approaches to FL topology and compare their resulting performance. After noticing concerning disturbances throughout their training process, we have successfully identified their source as the problem of exploding gradients. We have then made modifications to the model structure and analyzed the new results. Finally, we have proposed continuous monitoring of the FL training process through the local computation of a selected metric.
References
- L. U. Khan, W. Saad, Z. Han, E. Hossain, and C. S. Hong, “Federated learning for internet of things: Recent advances, taxonomy, and open challenges,” CoRR, vol. abs/2009.13012, 2020. [Online]. Available: https://arxiv.org/abs/2009.13012
- H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas, “Federated learning of deep networks using model averaging,” CoRR, vol. abs/1602.05629, 2016. [Online]. Available: http://arxiv.org/abs/1602.05629
- J. Wu, S. Drew, F. Dong, Z. Zhu, and J. Zhou, “Topology-aware federated learning in edge computing: A comprehensive survey,” 2023. [Online]. Available: https://arxiv.org/abs/2302.02573
- “Pilot Scenario Implementation – First Version,” 2022. [Online]. Available: https://assist-iot.eu/wp-content/uploads/2022/05/D7.2_Pilot_Scenario_Implementation-First_Version.pdf
- Introducing Federated Learning into Internet of Things ecosystems – preliminary considerations, 07 2022.
- G. Philipp, D. Song, and J. G. Carbonell, “The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions,” 2018.
- A. Li, L. Zhang, J. Wang, F. Han, and X.-Y. Li, “Privacy-preserving efficient federated-learning model debugging,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 10, pp. 2291–2303, 2022.
- Y. Liu, W. Wu, L. Flokas, J. Wang, and E. Wu, “Enabling sql-based training data debugging for federated learning,” CoRR, vol. abs/2108.11884, 2021. [Online]. Available: https://arxiv.org/abs/2108.11884
- W. Gill, A. Anwar, and M. A. Gulzar, “Feddebug: Systematic debugging for federated learning applications,” 2023.
- S. Duan, C. Liu, P. Han, X. Jin, X. Zhang, X. Xiang, H. Pan et al., “Fed-dnn-debugger: Automatically debugging deep neural network models in federated learning,” Security and Communication Networks, vol. 2023, 2023.
- B. Hanin, “Which neural net architectures give rise to exploding and vanishing gradients?” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, Eds., vol. 31. Curran Associates, Inc., 2018. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/13f9896df61279c928f19721878fac41-Paper.pdf
- A. Bellet, A. Kermarrec, and E. Lavoie, “D-cliques: Compensating noniidness in decentralized federated learning with topology,” CoRR, vol. abs/2104.07365, 2021. [Online]. Available: https://arxiv.org/abs/2104.07365
- L. Chou, Z. Liu, Z. Wang, and A. Shrivastava, “Efficient and less centralized federated learning,” CoRR, vol. abs/2106.06627, 2021. [Online]. Available: https://arxiv.org/abs/2106.06627
- N. Mhaisen, A. A. Abdellatif, A. Mohamed, A. Erbad, and M. Guizani, “Optimal user-edge assignment in hierarchical federated learning based on statistical properties and network topology constraints,” IEEE Transactions on Network Science and Engineering, vol. 9, no. 1, pp. 55–66, 2022.
- J. Lee, J. Oh, S. Lim, S. Yun, and J. Lee, “Tornadoaggregate: Accurate and scalable federated learning via the ring-based architecture,” CoRR, vol. abs/2012.03214, 2020. [Online]. Available: https://arxiv.org/abs/2012.03214
- I. Hegedűs, G. Danner, and M. Jelasity, “Gossip learning as a decentralized alternative to federated learning,” in Distributed Applications and Interoperable Systems, J. Pereira and L. Ricci, Eds. Cham: Springer International Publishing, 2019, pp. 74–90.
- Y. Shi, Y. E. Sagduyu, and T. Erpek, “Federated learning for distributed spectrum sensing in nextg communication networks,” 2022. [Online]. Available: https://arxiv.org/abs/2204.03027
- H. Eichner, T. Koren, H. B. McMahan, N. Srebro, and K. Talwar, “Semi-cyclic stochastic gradient descent,” CoRR, vol. abs/1904.10120, 2019. [Online]. Available: http://arxiv.org/abs/1904.10120
- A. Ghosh, J. Chung, D. Yin, and K. Ramchandran, “An efficient framework for clustered federated learning,” 2021.
- Harshvardhan, A. Ghosh, and A. Mazumdar, “An improved algorithm for clustered federated learning,” 2022.
- M. Zhang, E. Wei, and R. Berry, “Faithful edge federated learning: Scalability and privacy,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, pp. 3790–3804, 2021.
- D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 5650–5659. [Online]. Available: https://proceedings.mlr.press/v80/yin18a.html
- J. Wu, S. Drew, F. Dong, Z. Zhu, and J. Zhou, “Topology-aware federated learning in edge computing: A comprehensive survey,” arXiv preprint https://arxiv.org/abs/2302.02573, 2023.
- L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Edge-assisted hierarchical federated learning with non-iid data,” CoRR, vol. abs/1905.06641, 2019. [Online]. Available: http://arxiv.org/abs/1905.06641
- J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition,” Neural Networks, vol. 32, pp. 323–332, 2012, selected Papers from IJCNN 2011. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0893608012000457