Overlapping communications and computations in GPU-based iterative linear solvers
Krylov subspace methods such as Conjugate Gradient and Biconjugate Gradient Stabilized methods are well known approaches for solving symmetric and asymmetric systems of linear algebraic equations, such as systems usually arising from partial differential equations in computational mathematics problems, like Navier-Stokes equations in fluid dynamics. With increasing sizes of meshes and numbers of computational nodes, network communications time may become an issue: stalls during global reductions have increasing duration, preventing useful computations. This happens because, in original formulations of methods, computing a dot product requires a global reduce operation, and its value is required on the next step, so each process has to stop until all others reach this point, like in a barrier synchronization. We research alternative formulations of conjugate gradient methods (Preconditioned Conjugate Gradient and BiCGStab) for GPU-based iterative linear system solvers. They allow to have an overlap of parallel computations and communications, at the cost of increased amount of computations and memory requirements. We describe an implementation of our approach for GPU-accelerated hybrid systems in OpenFOAM, an open source framework for computational fluid dynamics. Asynchronous collective communications from MPI-3 parallel programming API are used to avoid full barrier synchronization and reduce latency. Experimental results on 2 and 4 million cases from standard OpenFOAM problems are presented.
Proceedings of the Institute for System Programming, vol. 28, issue 1, 2016, pp. 81-92.
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2016-28(1)-5Full text of the paper in pdf (in Russian) Back to the contents of the volume