WG211/M13Romph
Commercial and open source database systems consist of millions of lines of highly optimized C code. Yet, their performance on individual queries falls 10x or 100x short of what a hand-written, specialized, implementation of the same query can achieve.
In a recent joint project at Oracle Labs and the DATA Lab at EPFL, we have implemented a database query engine in Scala. With just about 3000 lines of Scala code, our prototype supports the full TPCH benchmark suite and runs queries several times as fast as highly tuned commercial systems (> 10x peak speedup).
The key ingredient is program generation using the LMS (Lightweight Modular Staging) framework, which enables us to mechanically derive query compilers by specializing rather naive query interpreters.