Performance analysis of scalable algorithms for 3D linear transforms

Ivan Lirkov, Marcin Paprzycki, Maria Ganzha, Stanislav Sedukhin, Paweł Gepner

DOI: http://dx.doi.org/10.15439/2014F374

Citation: Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 2, pages 613–622 (2014)

Full text

Abstract. The 3D forward/inverse separable discrete transforms, such as Fourier transform, cosine/sine transform, etc. are frequently the principal limiters that prevent many practical applications from scaling to the large number of processors. Existing approaches, which are based on 1D or 2D data decomposition, do not allow the 3D transforms to be scaled to the maximum possible number of computer nodes. Based on the proposed 3D decomposition of the data, highly scalable parallel algorithms of any forward/inverse 3D transform on the torus network of computer nodes was recently designed. The designed algorithms require compute-and-roll time-steps, where each step consists an execution of multiple GEMM operations and concurrent movement of cubical data between nearest-neighbor nodes. The proposed 3D orbital algorithms gracefully avoid a required 3D data transposition and can be extremely scaled up to the number of nodes which is equal to the size of the initial data.