The WARP storage engine beta: columnar storage for MySQL 8 with automatic bitmap indexing

Oracle MySQL is in need of a columnar storage engine for analytics workloads.  A columnar engine (or column store) stores data vertically, that is, it stores all the data associated with a column together, instead of the traditional RDBMS storage method of storing entire rows together, either in a index organized manner, like InnoDB, or in a heap, like MyISAM.  

Columnar storage has the benefit of reducing IO when only a subset of the row is accessed in a query, because only the data for the accessed rows must be read from disk (or cache) instead of having to read entire rows.  Most columnar stores do not support indexes, but WARP does.

WARP is open source

You can find the WARP source code release on GitHub.  Binaries can be provided upon request.  Simply open an issue for your desired Linux distribution, and I will make them available as soon as I can.

WARP is beta quality

WARP is based on Fastbit, which is currently version 2, and is used in production in a number of large scientific applications, such as grid computing analysis of particle accelerator data, working with genomic data, and other applications.  

WARP has been tested by a variety of alpha users.  It is likely that there are still bugs or missing features in the MySQL storage engine interface, thus it is not suggested to use WARP for production critical data.  It is suggested to test WARP against the same data in another storage engine to test for correctness.  

Bugs and feature requests can be reported on the GitHub issues page, at the GitHub link provided above.  

Support and consulting for WARP implementations is available through Swanhart Technical Services, as well as generic MySQL training and consulting services.  I will provide information about those options in another blog post.

Bitmap Indexing

While columnar storage is uncommon to open source SQL RDBMs, bitmap indexing is not available at all.  Bitmap indexes have characteristics that make them ideal for queries that traditional btree indexes can not answer efficiently (or at all), but they are not sorted, so they do not provide all of the same capabilities of btree indexes, such as the ability to provide pre-calculated sorting.  

WARP provides both columnar storage and automatic bitmap indexing of columns used in filters.  The end user doesn't have to pick which specific columns to index.  Compressed bitmap indexes are automatically created to support the queries run against the database.  It is possible to exclude columns from automatic indexing if desired.  

Collapse )

Interesting MySQL bugs

I like bugs.  Probably not as much as Valerii Kravchuk (http://mysqlentomologist.blogspot.com/) a former colleague of mine who posted about MySQL bugs for a long time, but I still like bugs.  I like finding bugs, I like fixing bugs, and I like analyzing bugs.  

While I work on the WARP storage engine, I am filing bugs and feature requests for issues I find with MySQL.  I also like to comment on MySQL bugs in the bug database to help the MySQL engineers find root causes, and to help end users with workarounds, or explaining misconceptions as to why something may not be a bug.  So here are a few interesting bugs I have encountered recently.

Optimizer gives bad estimates for ICP query plans resulting in FTS:
https://bugs.mysql.com/bug.php?id=100471

Collapse )

WARP storage engine updated for MySQL 8.0.21

MySQL 8.0.21 is out, and the WARP storage engine has been updated to support the changes made in the new version.

There are no significant functionality changes to the storage engine for MySQL 8.0.21, only the minor changes required to compile in the new version.  

I will be posting benchmarks of WARP vs InnoDB and Column Store using the Star Schema Benchmark, DBT3, and some other comparison tests.

I will also be releasing a Docker image to make testing WARP easy.  As always, binaries are available upon request.  Keep in mind that WARP is a pluggable storage engine, and you can request just the plugin, or the whole MySQL 8 release,  for version 8.0.19, 8.0.20, or 8.0.21.

WARP storage engine updated to support 8.0.20 and improve speed of scans

Engine condition pushdown changed in MySQL 8.0.20 to use a new handler::engine_push method, instead of handler::cond_push.  The WARP engine now implements handler::engine_push, so ECP and automatic index creation now work properly in 8.0.20.

I also improved the performance of deleted RID checks, which are now 3x faster than they were before.

I will be start posting benchmarks comparing performance of WARP to other storage engines later today on the Star Schema Benchmark, and DBT3 (the open source version of TPC-H).

ss - oscar

MyRocks has some strange performance issues for index scans

The details on this issue are here:
https://github.com/facebook/mysql-5.6/issues/369

This test is very simple. I loaded the SSB (star schema benchmark) data for scale factor 20 (12GB raw data), added indexes, and tried to count the rows in the table.

After loading data and creating indexes, the .rocksdb data directory is 17GB in size.

A full table scan "count(*)" query takes less than four minutes, sometimes reading over 1M rows per second, but when scanning the index to accomplish the same count, the database can only scan around 2000 rows per second. The four minute query would take an estimated 1000 minutes, a 250x difference.

I have eliminated the type of CRC32 function (SSE vs non-SSE) by forcing the hardware SSE function by patching the code.

There seem to be problems with any queries that use the indexes:
mysql> explain select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
| id | select_type | table     | type | possible_keys   | key             | key_len | ref   | rows | Extra       |
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
|  1 | SIMPLE      | lineorder | ref  | LO_OrderDateKey | LO_OrderDateKey | 4       | const | 6613 | Using index |
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----------+
| count(*) |
+----------+
|    49519 |
+----------+
1 row in set (11.43 sec)


I'm going to make a follow-up blog post comparing InnoDB performance using the same data and queries.
ss - oscar

RocksDB doesn't support large transactions very well

So I tried to do my first set of benchmarks and testing on RocksDB today, but I ran into a problem and had to file a bug:
https://github.com/facebook/mysql-5.6/issues/365

MySQL @ Facebook RocksDB appears to store at least 2x the size of the volume of changes in a transaction. I don't know how much space for the row + overhead there is in each transcation, so I'm just going to say 2x the raw size of the data changed in the transaction, as approximation. I am not sure how this works for updates either, that is whether old/new row information is maintained. If old/row data is maintained, then a pure update workload you would need 4x the ram for the given transactional changes. My bulk load was 12GB of raw data, so it failed as I have only 12GB of RAM in my test system.

The workaround (as suggested in the bug) is to set two configuration variables:
set rocksdb_bulk_load=1; 
set rocksdb_commit_in_the_middle=1


Note that if you set these settings and there is a crash (not sure about client abort) then the table will be partially loaded. This is fine for an import of a large data set, but it doesn't address the need to load bulk amounts of data in a nightly job for example. In that case, you definitely don't want to end up with half a table loaded after a crash or query error.

For now I would stay away from RocksDB if you need to regularly load data that is more than 1/2 the size of available ram. If you have concurrent read queries, then obviously that number will probably be smaller.

It would be ideal if the database didn't crash, and instead the load rolled back when this situation was encountered. It isn't very user friendly to crash the database due to large transactions, and I'm sure a stored procedure could be created to crash the database by simply inserting rows in a loop, which is a serious potential DOS.
ss - oscar

PS_history 2.0 was released last week with MySQL 8 support and bundled sys_history

PS_history is a tool which collects historical snapshots of the PERFORMANCE_SCHEMA (P_S). This allows you to trend P_S values over time, for example, it is possible to look at the 95 th percentile response time for a query over time.

PS_history is stored procedure and event based, and thus it resides entirely inside of the database with no external dependencies. It uses a clever technique to capture all of the P_S data in one consistent snapshot. This ensures that all of the sys_history views (bundled now with PS_history) have a consistent set of data.

By default, as long as the event_schedule is enabled, PS_history will collect data every 30 seconds. If a snapshot takes 30 seconds, there will be a 30 second delay before the next snapshot starts. This value can be changed by calling the `ps_history`.`set_collection_interval`(N) where N is the number of seconds between samples.

The `sys_history` schema is automatically created by rewriting queries in the `sys` schema, excluding the views that access INFORMATION_SCHEMA.

The tables in the `ps_history` schema are now indexed identically to the P_S indexes added in MySQL 8. These indexes support the sys_history views and can be used to quickly look up data in the history tables as well. You can of course add any additional indexes that you like to any PS_history table.

To setup ps_history you can download the git repostory, or the zip file here:
https://github.com/greenlion/ps_history

Then execute the setup script:
\. setup.sql

or

mysql -f < setup.sql

WarpSQL now has SQL shim plugins

I made some improvements to the 'proxy' inside of MySQL 5.7 that I've created for WarpSQL (Shard-Query 3).  I've made the MySQL proxy/shim pluggable and I moved the SQLClient to sql/sql_client.cc.  I've merged these changes into 'master' in my fork.

Now you can create "SQL shim" plugins (SHOW PASSWORD is implemented in plugin/sql_shim) and install them in the server like regular plugins:
-- command doesn't work
mysql> show password;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'password' at line 1

-- install the example sql_shim plugin:
mysql> install plugin sql_shim soname 'sql_shim.so';                                                                 Query OK, 0 rows affected (0.00 sec)

-- now the command works
mysql> show password;
+--+
|  |
+--+
|  |
+--+
1 row in set (0.00 sec)


There can be only one!
There may only be ONE "SQL shim" plugin running in the server at one time.  All "SQL shim" plugins must use the plugin name "sql_shim".  This prevents more than one "SQL shim" plugin from being utilized at once.  This is by design, because the MySQL plugin interface doesn't make any provisions for plugin execution order.  If you install a SQL shim plugin by another name, it won't cause harm, but it will not be utilized.

SQL injection in the MySQL server! (of the proxy kind)

[this is a repost of my http://shardquery.com blog post, because it did not syndicate to planet.mysql.com]

As work on WarpSQL (Shard-Query 3) progresses, it has outgrown MySQL proxy.  MySQL proxy is a very useful tool, but it requires LUA scripting, and it is an external daemon that needs to be maintained.  The MySQL proxy module for Shard-Query works well, but to make WarpSQL into a real distributed transaction coordinator, moving the proxy logic inside of the server makes more sense.

The main benefit of MySQL proxy is that it allows a script to "inject" queries between the client and server, intercepting the results and possibly sending back new results to the client.  I would like similar functionality, but inside of the server.

For example, I would like to implement new SHOW commands, and these commands do not need to be implemented as actual MySQL SHOW commands under the covers.

For example, for this blog post I made a new example command called "SHOW PASSWORD"

Example "injection" which adds SHOW PASSWORD functionality to the server

mysql> select user();
+----------------+
| user()         |
+----------------+
| root@localhost |
+----------------+
1 row in set (0.00 sec)

-- THIS COMMAND DOES NOT EXIST
mysql> show password;
+-------------------------------------------+
| password_hash                             |
+-------------------------------------------+
| *00A51F3F48415C7D4E8908980D443C29C69B60C9 |
+-------------------------------------------+
1 row in set (0.00 sec)


Important - This isn't a MySQL proxy plugin.  There is C++ code in the SERVER to answer that query, but it isn't the normal SHOW command code.  This "plugin" (I put it in quotes because my plan is for a pluggable interface but it isn't added to the server yet) doesn't access the mysql.user table using normal internal access methods. It runs actual SQL inside of the server, on the same THD as the client connection, in the same transaction as the client connection, to get the answer!

Problem #1 - Running SQL in the server

The MySQL C client API doesn't have any methods for connecting to the server from inside of the server, except to connect to the normally available socket interfaces, authenticate, and then issue queries like a normal client.  While it is perfectly possible to connect to the server as a client in this manner, it is sub-optimal for a number of reasons.  First, it requires a second connection to the server, second, it requires that you authenticate again (which requires you have the user's password), and lastly, any work done in the second connection is not party to transactional changes in the first, and vice-versa.

The problem is communication between the client and server, which uses a mechanism called VIO.  There was work done a long time ago for external stored procedures, which never made it into the main server that would have alleviated this problem by implementing a in-server VIO layer, and making the parser re-entrant.  That work was done on MySQL 5.1 though.

It is possible to run queries without using VIO though.  You simply can't get results back, except to know if the query succeeded or not.  This means it is perfectly acceptable for any command that doesn't need a resultset, basically anything other than SELECT.  There is a loophole however, in that any changes made to the THD stay made to that THD.  Thus, if the SQL executed sets any user variables, then those variables are of course visible after query execution.

Solution  - encapsulate arbitrary SQL resultsets through a user variable

Since user variables are visible after query execution, the goal is to get the complete results of a query into a user variable, so that the resultset can be accessed from the server.  To accomplish this, first a method to get the results into the variable must be established, and then some data format for communication that is amenable to that method has to be decided upon so that the resultset can be accessed conveniently..

With a little elbow grease MySQL can convert any SELECT statement into CSV resultset.  To do so, the following are used:


  1. SELECT ... INTO @user_variable

  2. A subquery in the FROM clause (for the original query)

  3. CONCAT, REPLACE, IFNULL, GROUP_CONCAT (to encode the resultset data)

Here is the SQL that the SHOW PASSWORD command uses to get the correct password:

select authentication_string as pw,
       user 
  from mysql.user 
 where concat(user,'@',host) = USER() 
    or user = USER() 
LIMIT 1

Here is the "injected" SQL that the database generates to encapsulate the SQL resultset as CSV:

select 
  group_concat( 
    concat('"',
           IFNULL(REPLACE(REPLACE(`pw`,'"','\\"'),"\n","\\n"),"\N"),
           '"|"',
           IFNULL(REPLACE(REPLACE(`user`,'"','\\"'),"\n","\\n"),"\N"),
           '"'
    ) 
  separator "\n"
  ) 
from 
  ( select authentication_string as pw,
           user 
      from mysql.user 
      where concat(user,'@',host) = USER() 
        OR user = USER() 
    LIMIT 1
  ) the_query 
into @sql_resultset ;
Query OK, 1 row affected (0.00 sec)

Here is the actual encapsulated resultset.  If there were more than one row, they would be newline separated.

mysql> select @sql_resultset;
+----------------+
| @sql_resultset |
+----------------+
| ""|"root"      |
+----------------+
1 row in set (0.00 sec)

Injecting SQL in the server

With the ability to encapsulate resultsets into CSV in user variables, it is possible to create a cursor over the resultset data and access it in the server.  The MySQL 5.7 pre-parse rewrite plugins, however,  still run inside the parser.  The THD is not "clean" with respect to being able to run a second query.  The parser is not re-entrant.  Because I desire to run (perhaps many) queries between the time a user enters a query and the server actually answers the query (perhaps with a different query than the user entered!) the MySQL 5.7 pre-parse rewrite plugin infrastructure doesn't work for me.

I modified the server, instead, so that there is a hook in do_command() for query injections.  I called it conveniently query_injection_point() and the goal is to make it a new plugin type, but I haven't written that code yet.  Here is the current signature for query_injection_point():


bool query_injection_point(
  THD* thd, COM_DATA *com_data, enum enum_server_command command,
  COM_DATA* new_com_data, enum enum_server_command* new_command );

It has essentially the same signature as dispatch_command(), but it provides the ability to replace the command, or keep it as is.  It returns true when the command has been replaced.

Because it is not yet pluggable, here is the code that I placed in the injection point:


/* TODO: make this pluggable */
bool query_injection_point(THD* thd, COM_DATA *com_data, enum enum_server_command command,
 COM_DATA* new_com_data, enum enum_server_command* new_command)
{
 /* example rewrite rule for SHOW PASSWORD*/
 if(command != COM_QUERY)
 { return false; }
 
 /* convert query to upper case */
 std::locale loc;
 std::string old_query(com_data->com_query.query,com_data->com_query.length);
 for(unsigned int i=0;i<com_data->com_query.length;++i) {
   old_query[i] = std::toupper(old_query[i], loc);
 } 
   
 if(old_query == "SHOW PASSWORD")
 {
   std::string new_query;
   SQLClient conn(thd);
   SQLCursor* stmt;
   SQLRow* row;

   if(conn.query("pw,user",
    "select authentication_string as pw,user from mysql.user " \
    "where concat(user,'@',host) = USER() or user = USER() LIMIT 1", &stmt))
   {
     if(stmt != NULL)
     {
       if((row = stmt->next()))
       {
          new_query = "SELECT '" + row->at(0) + "' as password_hash";
       }
       } else
       {
         return false;
       }
     } else {
       return false;
     }

     /* replace the command sent to the server */
     if(new_query != "")
     {
       Protocol_classic *protocol= thd->get_protocol_classic();
       protocol->create_command(
         new_com_data, COM_QUERY, 
         (uchar *) strdup(new_query.c_str()), 
         new_query.length()
       );
       *new_command = COM_QUERY;
     } else {
       if(stmt) delete stmt;
       return false;
     }
     if(stmt) delete stmt;
     return true;
   }
 }

 /* don't replace command */
 return false;
}

SQLClient

You will notice that the code access the mysql.user table using SQL, using the SQLClient, SQLCursor, and SQLRow objects.  These are the objects that wrap around encapsulating the SQL into a CSV resultset, and actually accessing the result set.  The interface is very simple, as you can see from the example.  You create a SQLClient for a THD (one that is NOT running a query already!) and then you simply run queries and access the results.

The SQLClient uses a stored procedure to methodically encapsulate the SQL into CSV and then provides objects to access and iterate over the data that is buffered in the user variable.  Because MySQL 5.7 comes with the sys schema, I placed the stored procedure into it, as there is no other available default database that allows the creation of stored procedures.  I called it sys.sql_client().

Because the resultset is stored as text data, the SQLRow object returns all column values as std::string.

What's next?

I need to add a proper plugin type for "SQL injection plugins".  Then I need to work on a plugin for parallel queries.  Most of the work for that is already done, actually, at least to get it into an alpha quality state.  There is still quite a bit of work to be done though.

You can find the code in the internal_client branch of my fork of MySQL 5.7:

http://github.com/greenlion/warpsql-server