Dixie
Neha Narula, Robert T. Morris

Dixie is a query planner, optimizer, and executor for web applications on partitioned databases.

Abstract

Optimizing Web Application Queries for a Partitioned Database Abstract

Partitioning data is an attractive way to increase storage server throughput for web-like workloads, but two key challenges arise: (1) web workloads often do not have one clear partitioning and (2) it is challenging for the web developer to determine how to efficiently execute queries over partitioned tables. These two challenges can lead to per-query overhead, rather than useful work, dominating total throughput.

Dixie is a SQL query planner, optimizer, and executor for databases horizontally partitioned over multiple servers. Dixie automates the exploitation of tables with multiple copies partitioned in different ways, in order to increase throughput by expanding the set of queries that need not be sent to all servers. Central to Dixie's design are a cost model and plan generator that are mindful of queries small enough that query overhead may dominate the cost.

We evaluate Dixie on a database and query stream taken from Wikipedia. The database is partitioned across ten MySQL servers. By adding one copy of a 13 MB table and using Dixie's query optimizer, we achieve a throughput improvement of 3.2X over a single optimized partitioning of each table and 8.5X over a single server. On specific queries Dixie with table copies increases throughput linearly with the number of servers, while the best single-table-copy partitioning achieves little scaling. For a large class of joins, which traditional wisdom suggests require tables partitioned on the join keys, Dixie can find higher-performance plans using other partitionings

Papers

Neha Narula and Robert Morris. Executing Web Application Queries on a Partitioned Database. USENIX Webapps, 2012.

Neha Narula. Distributed Query Execution on a Replicated and Partitioned Database. Master's Thesis, 2010.

Software

We have implemented Dixie as a query planner in Java. The code is pretty rough. There is also Python code to set up and issue queries over a partitioned database.

Dixie code
This has the java query planner in /planner and Wikipedia benchmarks and tests in /wikipedia.

MIT CSAIL Parallel & Disributed Operating Systems Group