Loops software pipelining on ARM platform.
This article describes improvements we made in the implementation of swing modulo scheduling (SMS), a well-known software pipelining technique, in the GNU Compiler Collection (GCC) for ARM platform. Prior GCC implementation required a loop being pipelined to conform to the do-loop pattern, which needs a special hardware instruction. However, such hardware instruction is absent on ARM. First we implemented a “fake” do-loop instruction in the ARM backend, which helped us to verify whether GCC SMS implementation is profitable on ARM. Then we designed and implemented support for loops which loop counter varies as an arithmetic progression. In do-loops the loop counter must be used only in control part of the loop, and we allow reading loop counter register by other loop instructions. For such loops we improved the algorithm for creating prologue and epilogue as well as implemented much more complex algorithm of verification conditions for entering performance-optimized version of the loop. Also we made necessary changes in data dependency graph to generate correct code. When dependency graph is built we create additional anti-dependencies between instructions which use flag register. The resulting performance improvement is 3-4% for selected test applications on ARM platform. For x86- 64 platform, performance results are mostly neutral, with exception of 2-3% improvement on matrix multiplication tests.
Proceedings of the Institute for System Programming, vol. 22, 2012, pp. 33-48.
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2012-22-3Full text of the paper in pdf (in Russian) Back to the contents of the volume