Parallel & Distributed Operating Systems Group

Soup: Web application databases

Project overview

Soup is an attempt at designing a database specifically tailored for web applications, providing automatic caching, safe and effortless schema migrations, and native support for reactive use.

Soup observes that, by having developers provide the set of queries their application will make in advance, the database can be smarter about how to execute those queries. In particular, it can choose to pre-compute, and incrementally maintain, the results for queries. This allows Soup to answer those queries quickly, and essentially obviates the need for application caches.

Project components

Safe schema migrations

By smartly re-using this materialized state, Soup can also provide fast schema migrations. Since the raw data log is always kept, migrations can be undone easily, and queries using the pre-migration queries can still be satisfied. Furthermore, since the users only specify queries, not the underlying schema, the system can choose to internally implement the user’s queries in whatever schema it deems to be most efficient.

Streaming data model

Soup is built from the bottom-up to be a streaming data system by using data-flow. This allows web applications to observe a stream of changes to the result set of queries they are interested in, which fits well with the new reactive-style web applications inspired by Meteor.

Distribution and scaling

The data-flow computation model used in Soup enables efficient multi-core and cross-machine implementation of the application’s set of queries. By carefully analyzing the graph, Soup can make strategic choices about what operators should be placed on which computers, what state should be materialized, and how to shard and partition the data for availability and performance.