?

Log in

No account? Create an account
ss - oscar

My SQL Dump

MySQL musings by a self professed MySQL Geek

PHP-SQL-Parser updated again
ss - ernie
swanhart
I have bumped the minor version to 4.1.2 with this release which incorporates various pull requests from contributors. Of note is support for ALTER statements in PHPSQLCreator, which is the components of PHP-SQL-Parser responsible for turning a parse tree back into an executable SQL statement, basically an "unparser".

PHP-SQL-Parser updated with pull requests
ss - oscar
swanhart
I took some time this morning to check out and merge some existing pull requests for PHP-SQL-Parser, the most popular SQL parser for MySQL and PHP.

I'm thinking about adding a Validate($conn,$sql) method to the parser to validate the syntax of the SQL against the given MySQL database.

MyRocks has some strange performance issues for index scans
ss - oscar
swanhart
The details on this issue are here:
https://github.com/facebook/mysql-5.6/issues/369

This test is very simple. I loaded the SSB (star schema benchmark) data for scale factor 20 (12GB raw data), added indexes, and tried to count the rows in the table.

After loading data and creating indexes, the .rocksdb data directory is 17GB in size.

A full table scan "count(*)" query takes less than four minutes, sometimes reading over 1M rows per second, but when scanning the index to accomplish the same count, the database can only scan around 2000 rows per second. The four minute query would take an estimated 1000 minutes, a 250x difference.

I have eliminated the type of CRC32 function (SSE vs non-SSE) by forcing the hardware SSE function by patching the code.

There seem to be problems with any queries that use the indexes:
mysql> explain select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
| id | select_type | table     | type | possible_keys   | key             | key_len | ref   | rows | Extra       |
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
|  1 | SIMPLE      | lineorder | ref  | LO_OrderDateKey | LO_OrderDateKey | 4       | const | 6613 | Using index |
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----------+
| count(*) |
+----------+
|    49519 |
+----------+
1 row in set (11.43 sec)


I'm going to make a follow-up blog post comparing InnoDB performance using the same data and queries.

RocksDB doesn't support large transactions very well
ss - oscar
swanhart
So I tried to do my first set of benchmarks and testing on RocksDB today, but I ran into a problem and had to file a bug:
https://github.com/facebook/mysql-5.6/issues/365

MySQL @ Facebook RocksDB appears to store at least 2x the size of the volume of changes in a transaction. I don't know how much space for the row + overhead there is in each transcation, so I'm just going to say 2x the raw size of the data changed in the transaction, as approximation. I am not sure how this works for updates either, that is whether old/new row information is maintained. If old/row data is maintained, then a pure update workload you would need 4x the ram for the given transactional changes. My bulk load was 12GB of raw data, so it failed as I have only 12GB of RAM in my test system.

The workaround (as suggested in the bug) is to set two configuration variables:
set rocksdb_bulk_load=1; 
set rocksdb_commit_in_the_middle=1


Note that if you set these settings and there is a crash (not sure about client abort) then the table will be partially loaded. This is fine for an import of a large data set, but it doesn't address the need to load bulk amounts of data in a nightly job for example. In that case, you definitely don't want to end up with half a table loaded after a crash or query error.

For now I would stay away from RocksDB if you need to regularly load data that is more than 1/2 the size of available ram. If you have concurrent read queries, then obviously that number will probably be smaller.

It would be ideal if the database didn't crash, and instead the load rolled back when this situation was encountered. It isn't very user friendly to crash the database due to large transactions, and I'm sure a stored procedure could be created to crash the database by simply inserting rows in a loop, which is a serious potential DOS.

PS_history 2.0 was released last week with MySQL 8 support and bundled sys_history
ss - oscar
swanhart
PS_history is a tool which collects historical snapshots of the PERFORMANCE_SCHEMA (P_S). This allows you to trend P_S values over time, for example, it is possible to look at the 95 th percentile response time for a query over time.

PS_history is stored procedure and event based, and thus it resides entirely inside of the database with no external dependencies. It uses a clever technique to capture all of the P_S data in one consistent snapshot. This ensures that all of the sys_history views (bundled now with PS_history) have a consistent set of data.

By default, as long as the event_schedule is enabled, PS_history will collect data every 30 seconds. If a snapshot takes 30 seconds, there will be a 30 second delay before the next snapshot starts. This value can be changed by calling the `ps_history`.`set_collection_interval`(N) where N is the number of seconds between samples.

The `sys_history` schema is automatically created by rewriting queries in the `sys` schema, excluding the views that access INFORMATION_SCHEMA.

The tables in the `ps_history` schema are now indexed identically to the P_S indexes added in MySQL 8. These indexes support the sys_history views and can be used to quickly look up data in the history tables as well. You can of course add any additional indexes that you like to any PS_history table.

To setup ps_history you can download the git repostory, or the zip file here:
https://github.com/greenlion/ps_history

Then execute the setup script:
\. setup.sql

or

mysql -f < setup.sql

Some thoughts on recent events (repost)
ss - oscar
swanhart
[something happened to this post and I am reposting it]
It was suggested by Monty that the posts I've made about MariaDB are for publicity. This simply isn't true. I would have much preferred a different outcome in my interactions with MariaDB. I figured that they would end up giving me a hard time, and I'd be stubborn and we'd both hate each other for as long as I could keep from leaving. A quick separation actually seems much better in such context. Regardless, I would have preferred to speak amicably to the MariaDB Corporation about switching the license back, or at least moving to the new license at the time of the notification of the community, ie, changing the license so that bug fixes for 1.3.4 were not mingled with new 2.0 features.

It could have been easily possible to have a 1.3.5 release that fixes the major bugs in 1.3 and then release a new set of features as 2.0. This would have been at least reasonable, but by mixing 1.3 and 2.0 fixes in with the new license.

But really, the license change should be in 2.1. It should have been announced that that there were plans to change the license. By doing what they did with 2.0, they effectively extort at least some (in their minds 1%) of their users for bug fixes.

I think that 2.0 should be relicensed GPL, and that the 2.1 line (with some new features not in 2.0) made BSL. This is only fair to the community and existing users. Existing users could choose to stay with 2.0, with the promise of major bug fixes in the 2.0 and 2.1 lines until 2.2 is released and 2.0 becomes GPL.

OPEN SOURCE IS ABOUT BEING OPEN NOT JUST ABOUT SOURCE CODE.
MariaDB makes changes to the storage engine API without even consulting the mailing list. In fact, some MariaDB replication features don't work with TokuDB because a new storage engine API call is required, and this call requires functionality that might not exist in all engines, that is, the engine must be able to convert a transaction id into a THD object. No API is provided by the SE to make this translation. MariaDB must communicate changes to the storage API to the public so that the public can ensure that tools made to work with MariaDB continue to work with MariaDB. It doesn't matter that TokuDB is made by a competitor. In a foreign language article, Kaj recently said MariaDB is more open than Oracle because it includes many storage engines. This is a crappy argument. MariaDB is trying to say they are better than Oracle because they profit more from including code they didn't actually write, that mostly only half works anyway? The Cassandra SE is effectively dead, CONNECT is externally maintained, MROONGA isn't maintained as far as I can tell, and it doesn't really work right, Spider has issues as well. I don't think shipping a bunch of broken stuff that you didn't write, and that you won't help remain compatible being more open...

Fired for supporting open source
ss - oscar
swanhart
I have been fired for speaking out about the GPL and MariaDB actions that have caused great harm to our ecosystem.

It has been pointed out that I have a non-compete agreement. None of my tools compete with MariaDB and I have no non-public knowledge of MariaDB technology. GPLScale remains free software under the GNU GPL license and it is my right to fork a github repo. I am not paid to work on GPLScale and I don't intend to get paid to maintain it by anyone. All my projects are labors of love.


Who wants to hire me? I'm dedicated, honest, open, and I have integrity. I'm willing to risk everything for what I believe in.

Email me at:
greenlion at gmail dot com


----


I have been a proponent of GPL for a long time, and I don't need publicity.
http://swanhart.livejournal.com/128586.html

https://libraries.io/github/greenlion


https://blog.jcole.us/2008/07/23/on-mysql-forks-and-mysqls-non-open-source-documentation/
Justin Swanhart on July 23, 2008 at 12:09
Brian,

The “thing is” that we shouldn’t have to write it. MySQL may be ‘open source’, but it isn’t ‘open software’. With all the talk recently of proprietary extensions, the documentation licensing issues, the whole ‘enterprise’ vs ‘community’ debacle, well, MySQL has totally shown that they are not ‘Open Software’ company.

WHEN I WAS SEVENTEEN I BROKE THE LAW AND DROVE FROM PITTSBURGH PA TO BOSTON MA FOR THE FIRST FREE SOFTWARE FOUNDATION CONFERENCE. I sat down and had breakfast with STALLMAN, TORVALDS, AND RAYMOND! My whole life is open source. I've made my living off it and I intend to make sure others have the same opportunities that I have.


Posted via m.livejournal.com.

Tags: ,

BSL BS: How do you backport fixes from MaxScale 2.0 to MaxScale 1.0?
ss - oscar
swanhart

I have created a GitHub fork of MaxScale which does not include the 2.0 branch.  The repository is called GPLScale.

There is a big potential problem with this.  Take for example the file server/core/utils.c, which now has a new license:
https://github.com/mariadb-corporation/MaxScale/blob/2.0/server/core/utils.c

This file WAS covered under GPL:
  https://github.com/mariadb-corporation/MaxScale/blob/cb3213af6382c380df9fc47347764f087ae7ecb4/server/core/utils.c

When there is a bugfix in 2.0 for server code that existed in 1.0, can that code be backported directly into GPLscale?

a) util.c in 2.0 is clearly derivative of 1.0, and is therefore a derivative work of 1.0 and therefore still covered by GPL and b) there is a violation of GPL, because 1.0 code is being linked against code covered by an incompatible license.

Almost all of the files in MaxScale 2.0 derive from 1.0 so this problem is not just in one file of course.


The FSF says that the owner of the software can't legally violate the GPL they can morally violate the GPL.  MariaDB Corporation has shown that they do not have very good ethics and morals.  The MariaDB executives knowingly released MariaDB Corporation ColumnStore alpha 1 with bundled included libraries which included 10+ year old vulnerabilities.  The bundled net-snmp library could have been updated before releasing (in fact, I removed it almost immediately after the release) but they would not hold up the alpha release deadline, which was a meaningless deadline.  It still doesn't even compile correctly in the master branch (the released alpha only compiles on a specially configured VM).  Releasing code that can't compile is also not against the letter of the GPL, but it is certainly against the spirit of it.

MariaDB has shown that they do not care about open source and free software.  This is not FUD.


Basically Shitty License
ss - oscar
swanhart

Monty announced that he has created a new non-open source license called the "Business Source License" or BSL.  I think it should have a different name...

You see, Monty has fundamentally crafted a straw man to stand in for the general Open Source model by applying his experience in the dog-eat-dog world of forked software, in particular, the "ecosystem" of MySQL.  The software that MariaDB draws the majority of their income from is MariaDB, which is a fork of MySQL.  If you don't know the history, well, you see, SUN bought MySQL, Oracle bought Sun, and Monty, in an environment of nearly Biblical levels of FUD, forked MySQL into MariaDB (both products are named after his daughters).

While MariaDB was originally envisioned as a "drop in/drop out" replacement, it has diverged so far from the Oracle product that it is no longer even "drop in" with the latest versions of MySQL. Oracle is adding amazing new features, like a real data dictionary (using innodb tables), improved partition management, group replication, etc, that MariaDB simply can not compete with. Forking MySQL was a good business move for a time, but unfortunately, that time has passed.  MariaDB is now obviously trying to compete with Oracle in different areas than MySQL server innovation.

MariaDB corporation's fork of InfiniDB (aka MariaDB Corporation ColumnStore [not to be confused with Microsoft ColumnStore indexes]) is one area where they are obviously trying to differentiate themselves, as well as MaxScale.  I should note though, that even though MariaDB Corporation ColumnStore is GPL, MariaDB still says you must agree to another evaluation agreement to download the binaries.  MaxScale is now BSL which creates problems for the "ecosystem".  The problem is that the "ecosystem" is toxic.  A community can not work when one of the members is actively poisoning the well.  Generally when a software is forked, the communities split, but due to the nature of MySQL that never happened.  This is a toxic "open source" environment, where the "freedom" of open source is just a token gesture and only acts as a vehicle to more restrictive licenses (hosted tools in the case of Percona, non-open source software in the case of MariaDB).  Part of the nature of the "ecosystem" is that the consulting companies (Oracle, MariaDB, Percona) could each support the "full stack" of software.  If you bought Percona support, and there was a bug in MaxScale, Percona could fix it. Now, if you use more than three production instances you have to pay MariaDB a special fee to support just that part of your stack, and if you want Percona support for the rest, you have to pay for that too.  That is harmful to the ecosystem.  Monty just doesn't like fair competition. Surely forking MariaDB from MySQL was leveraging the freedom of open source, but now he eschews open source for (ahem) greener pastures.

What is production?
I define a production system as one that has passed through the beta phase.  The beta phase is a phase of indeterminate length where the system as a whole, and all subsystems, are tested for correctness in all manners possible.  Such a system does not have to be labled as beta to the public, but only assigned such designation internally, since the public at large may find the term beta "offputting", especially considering that new companies often face significant techincal challenges when trying to scale a system, and such problems can not be identified when at low scale.  Since any such system is subject to unexpected downtime, any site that declares itself beta should declare an uptime SLA which is less than or equal to "two nines".

Only when a system has been thoroughly publically stressed at the scale it is intended to operate at, can a system be declared production.  Once that happens, purchasing of licenses can happen.


WarpSQL now has SQL shim plugins
swanhart
I made some improvements to the 'proxy' inside of MySQL 5.7 that I've created for WarpSQL (Shard-Query 3).  I've made the MySQL proxy/shim pluggable and I moved the SQLClient to sql/sql_client.cc.  I've merged these changes into 'master' in my fork.

Now you can create "SQL shim" plugins (SHOW PASSWORD is implemented in plugin/sql_shim) and install them in the server like regular plugins:
-- command doesn't work
mysql> show password;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'password' at line 1

-- install the example sql_shim plugin:
mysql> install plugin sql_shim soname 'sql_shim.so';                                                                 Query OK, 0 rows affected (0.00 sec)

-- now the command works
mysql> show password;
+--+
|  |
+--+
|  |
+--+
1 row in set (0.00 sec)


There can be only one!
There may only be ONE "SQL shim" plugin running in the server at one time.  All "SQL shim" plugins must use the plugin name "sql_shim".  This prevents more than one "SQL shim" plugin from being utilized at once.  This is by design, because the MySQL plugin interface doesn't make any provisions for plugin execution order.  If you install a SQL shim plugin by another name, it won't cause harm, but it will not be utilized.