WG211/M10Kelly
Using DSLs to open expand the parallel code synthesis design space by Paul Kelly
What is the right code to generate, for a given hardware platform? How does this change as problem parameters change? This talk presents some recent work-in-progress in the finite-element fluid dynamics domain; we show some of the fruits of our attempt to map out the design space. Our goal is to build tools that automatically synthesise the optimal implementation. By getting the abstraction right, we can capture design choices far beyond what a conventional compiler can do. For example, we show that in low-order finite-element formulations, assembling the global sparse system matrix is efficient on CPUs, but on GPUs the balance is shifted to favour a different, local assembly algorithm, with a better memory access pattern. In high-order problems, this turns out to be attractive on CPUs as well. The choice of high- or low-order is a tunable parameter, giving a rich space of implementation alternatives with different accuracy-performance characteristics on different hardware.