Dixie is a query planner, optimizer, and executor for web applications on partitioned databases.
Abstract
Optimizing Web Application Queries for a Partitioned Database
Abstract
Partitioning data is an attractive way to increase storage
server throughput for web-like workloads, but two key
challenges arise: (1) web workloads often do not have one
clear partitioning and (2) it is challenging for the web
developer to determine how to efficiently execute queries over
partitioned tables. These two challenges can lead to
per-query overhead, rather than useful work, dominating total
throughput.
Dixie is a SQL query planner, optimizer, and
executor for databases horizontally partitioned over multiple
servers. Dixie automates the exploitation of tables with
multiple copies partitioned in different ways, in order to
increase throughput by expanding the set of queries that need
not be sent to all servers. Central to Dixie's design are a
cost model and plan generator that are mindful of queries
small enough that query overhead may dominate the cost.
We evaluate Dixie on a database and query stream taken from
Wikipedia. The database is partitioned across ten MySQL
servers. By adding one copy of a 13 MB table and using Dixie's
query optimizer, we achieve a throughput improvement of 3.2X
over a single optimized partitioning of each table and 8.5X
over a single server. On specific queries Dixie with table
copies increases throughput linearly with the number of
servers, while the best single-table-copy partitioning
achieves little scaling. For a large class of joins, which
traditional wisdom suggests require tables partitioned on the
join keys, Dixie can find higher-performance plans using other
partitionings
Papers
Neha Narula and Robert Morris. Executing Web Application Queries on a Partitioned Database. USENIX Webapps, 2012.Neha Narula. Distributed Query Execution on a Replicated and Partitioned Database. Master's Thesis, 2010.
Software
We have implemented Dixie as a query planner in Java. The code is pretty rough. There is also Python code to set up and issue queries over a partitioned database.
- Dixie code
- This has the java query planner in /planner and Wikipedia benchmarks and tests in /wikipedia.