July 7th, 2020

The WARP storage engine beta: columnar storage for MySQL 8 with automatic bitmap indexing

Oracle MySQL is in need of a columnar storage engine for analytics workloads.  A columnar engine (or column store) stores data vertically, that is, it stores all the data associated with a column together, instead of the traditional RDBMS storage method of storing entire rows together, either in a index organized manner, like InnoDB, or in a heap, like MyISAM.  

Columnar storage has the benefit of reducing IO when only a subset of the row is accessed in a query, because only the data for the accessed rows must be read from disk (or cache) instead of having to read entire rows.  Most columnar stores do not support indexes, but WARP does.

WARP is open source

You can find the WARP source code release on GitHub.  Binaries can be provided upon request.  Simply open an issue for your desired Linux distribution, and I will make them available as soon as I can.

WARP is beta quality

WARP is based on Fastbit, which is currently version 2, and is used in production in a number of large scientific applications, such as grid computing analysis of particle accelerator data, working with genomic data, and other applications.  

WARP has been tested by a variety of alpha users.  It is likely that there are still bugs or missing features in the MySQL storage engine interface, thus it is not suggested to use WARP for production critical data.  It is suggested to test WARP against the same data in another storage engine to test for correctness.  

Bugs and feature requests can be reported on the GitHub issues page, at the GitHub link provided above.  

Support and consulting for WARP implementations is available through Swanhart Technical Services, as well as generic MySQL training and consulting services.  I will provide information about those options in another blog post.

Bitmap Indexing

While columnar storage is uncommon to open source SQL RDBMs, bitmap indexing is not available at all.  Bitmap indexes have characteristics that make them ideal for queries that traditional btree indexes can not answer efficiently (or at all), but they are not sorted, so they do not provide all of the same capabilities of btree indexes, such as the ability to provide pre-calculated sorting.  

WARP provides both columnar storage and automatic bitmap indexing of columns used in filters.  The end user doesn't have to pick which specific columns to index.  Compressed bitmap indexes are automatically created to support the queries run against the database.  It is possible to exclude columns from automatic indexing if desired.  

Collapse )