My SQL Dump

MySQL musings by a self professed MySQL Geek

Previous Entry Share
Announcing a new MySQL fork: TroySQL
swanhart
First, I believe that Oracle MySQL, Percona MySQL and SkySQL MySQL (MariaDB) are all great databases with their own merits. You should probably try all of them and see which one works best for you. Personally I like Percona Server, but all of them are good.

So why do we need ANOTHER fork when there are already three perfectly acceptable commercial forks?
a) Experimentation
b) Community Involvement
c) Serious feature/functionality divergence/disparity
d) Extensibility
e) Storage engines
f) OLAP focus

Experimentation:
There are a lot of interesting and useful things in the database world that are not possible in MySQL. Many of these features have been implemented as middleware or add-ons as third party tools, but they don't get wide adoption because they are not "built-in" to the database. Two good examples are Flexviews and Shard-Query. Other experiments and worklogs have had significant work checked into other branches and forks, but that work has never been accepted into upstream. A good example of that are "perl stored procedures", or "external stored procedures". Another is allowing plugins to add variables to the THD without modifying the source. These are good contributions that are in danger of being lost. They should not languish in a tree somewhere. I think there should be an active fork to test these ideas.

Community Involvement:
If the fork encourages experimentation, then community involvement is more likely. If the fork is more accepting of code contributions (like those with alternate licenses like BSD two and three clause) this too encourages community involvement. So both are true with TroySQL. Experimentation is encouraged, and you can contribute code under any license deemed compatible with GPL V2 (LGPLv2, BSD2,BSD3). This does not mean that every feature submitted for inclusion will make it into a release or that it will work exactly as submitted, but if you want a new cool feature like user defined types, or you want to really make foreign keys work for all engines, then this would be a great place to try it out.

Serious feature/functionality divergence/disparity:
While TroySQL does not plan to diverge so far from MySQL as to be unrecognizable, there is a high price to be paid for guaranteeing compatibility with regular MySQL. This price may be acceptable in commercial forks but I'd rather say "not guaranteed compatible" and just be done with it. So TroySQL will maintain compatibility with regular MySQL where possible, but when necessary it will diverge. Given the desire for experimentation divergence is necessary and should not be considered a negative quality.

Extensibility:
Instead of plugins with a C interface, I want to start moving to TRIGGERS. Binlog event triggers, user logon/logoff triggers, statement triggers, commit triggers, etc. This makes writing extensions in any language possible, as a trigger can be written in any language that the database supports. Many current plugins can be rewritten to use the trigger interface instead. There will still be a use for C plugins, particularly for storage engines.

Storage engines:
A column store is needed, and a compressing one. Index plugins too so that bitmap and text indexes can be implemented in an engine agnostic way. Proper triggers for row changes that can invoke C code will of course be useful too which can lead to proper FK support for all engines.

I would like to take the CONNECT storage engine, and implement proper SQL/MED data wrapper syntax on top of it instead of using it as a different storage engine.

OLAP focus:
None of the current forks plan on adding intra-query parallelism any time soon. This feature is necessary for good performance on large volumes of database when large volumes of data must be examined by queries. TroySQL will support query parallelism, starting with SELECT queries. Users have been asking for window functions and common table expressions. These will be added too, as well as table functions.

Materialized views, bitmap indexes, a column store, star join optimization and query rewrite are needed too. An optimizer that can use more than one join type and that has a hash join that can spill to disk would be nice too, as would sort/merge joins.

Release plan:
TroySQL will be released as "preview releases". Each preview release will have one or two new features. Some features might have a lot of depth, others might be simple. Each preview release will be named, and there may be multiple preview releases for a feature. So, for example, the goal of the first preview release is query rewrite, parallel query, php stored procedures and perl stored procedures. It will be called "TroySQL 5.6.X Green Swan PR1". Then, as development progresses, "Green Swan PR2" will be released. Once everything is well tested and feature complete, it will become "TroySQL 5.6.X Green Swan GA1" Finally, in a GA release, point release improvements (no new features) are labeled as "TroySQL 5.6.X Green Swan GA1.1, ... GA1.2, etc).

This will identify a feature set with a name. The next release will include everything in Green Swan and add new features. Thus, it is easy to identify when a test release is being used, and because the major MySQL release number is always in the name, it is clear which name is newer than the others.

Why Green Swan? Well, Black Swan would of course be apropos, but my favorite color is Green, and my name is Swanhart. And a Black Swan is something unexpected, thus Green Swan is even more unexpected. Or something like that :)

Where to find it:
https://github.com/greenlion/troysql-5.6/
Tags:

Sounds cool. Perhaps you can also add the Gearman UDF and the memcached UDF.

I would like to add gearman as a daemon plugin and add the UDF into the server. If you want to make it a priority, fork a copy on github and I'll merge the changes in.

what about starting off with drizzle?

(Anonymous)

2013-11-14 05:49 am (UTC)

It seems that it would fit most of your requirements: a cleaned-up codebase, more/better extensibility points, a focus on community

Re: what about starting off with drizzle?

swanhart

2013-11-14 07:57 am (UTC)

Drizzle has diverged very far from MySQL. It removed a lot. I don't want to remove a lot, I want to add a lot. It would take a lot of effort to add back many of the missing things that I want that were removed in drizzle. And backporting new features from MySQL 5.6, etc, would be very difficult. Backporting the multiple trigger support from 5.7 will be important, where drizzle removed triggers completely.

Edited at 2013-11-14 07:58 am (UTC)

You are viewing swanhart