Interesting MySQL bugs
I like bugs. Probably not as much as Valerii Kravchuk (http://mysqlentomologist.blogspot.com/) a former colleague of mine who posted about MySQL bugs for a long time, but I still like bugs. I like finding bugs, I like fixing bugs, and I like analyzing bugs.
While I work on the WARP storage engine, I am filing bugs and feature requests for issues I find with MySQL. I also like to comment on MySQL bugs in the bug database to help the MySQL engineers find root causes, and to help end users with workarounds, or explaining misconceptions as to why something may not be a bug. So here are a few interesting bugs I have encountered recently.
Optimizer gives bad estimates for ICP query plans resulting in FTS:
MySQL appears to provide estimates for ICP access that stop at the first range scan in an index, and does not consider that ICP will (likely) return less rows to the storage engine for evaluation due to ICP. ICP prevents the database from having to fetch the entire row by primary key and evaluate the condition at the SQL layer. So if ICP can reduce the number of rows evaluated (or pushed up as it were) to the SQL layer, it will speed up the query quite a bit. MySQL will switch to a full table scan (FTS) if a certain threshold of rows are expected to be seen by the SQL layer, because scanning the table is generally faster than too many O(log(n)) operations to fetch the data by primary key. In the query/data for the bug, MySQL optimizer thinks that 30% or more of the table will be evaluated at the SQL layer, so it decides to do FTS, when in reality, with ICP, only 2% of the values will be evaluated and it is definitely faster (by 6x) to choose to use the index with ICP. FORCE INDEX can be used to ensure the database uses ICP.
ECP does not provide a way to feed into SQL join cost evaluations
This bug is similar to the above bug, but instead of not getting proper estimates for ICP, this bug involves not getting proper estimates for ECP, actually not getting estimates AT ALL. If ECP filters a table to a small number of rows, MySQL will not join the tables in the order for best performance, because table scan cost (and ECP filters table scans) is set BEFORE ECP evaluates conditions, and there is no mechanism to feed the cost back into the plan and reorder tables appropriately. WARP is highly dependent on ECP and thus query plans may end up being bad on WARP unless the STRAIGHT_JOIN hint is used to force table join order. This is of course inconvenient. I will be implementing a fix for this in WarpSQL and will submit the fix to Oracle who may or may not decide to use my fix. If they don't I will have to update WarpSQL to use their method once they release it, but who knows when that will be. Many "performance" bugs stay open for a very long time and are never fixed.
Row comparators on string columns are much slower with IN than constructing complex WHERE clauses
This is an interesting one. MySQL supports using "row comparisons" for IN expressions. This is where you say "select * from table where (a,b) in (1,1)" instead of "select * from table where a=1 and b=1. Unfortunately, internally MySQL does not handle these two evaluations in the same way. It seems that a Field_Item is constructed for the second version, and that compares strings efficiently, but for the row comparator, a different function called Field::cmp is called on each field of the table, and this appears to copy the data from the row constructor (and maybe the field? I haven't looked at the source) to do the comparison. The row comparator is MUCH slower because of this. The temporary workaround is to use latin1 instead of utf8mb4 for the fields, so long as of course the fields do not actually contain utf8 data. I used the 'perf' tool on Linux to determine what the problem is. Perf is one of my favorite tools, and I use it a lot when I am trying to figure out why something that is CPU bound is slow.
MySQL does not report a table has a discarded tablespace for all queries
It's not a bug, it is a feature! In this case this is true. The MySQL storage engine interface scans whatever data the database asks it to scan, and the SQL layer filters out rows that do not match the scan (the exceptions are ICP and ECP of course). When an "impossible WHERE clause" is detected like 1=0 or WHERE NULL or WHERE 0, then MySQL doesn't try to scan any rows, because if it did, the storage engine would have to return all the rows and MySQL would filter them out. This means that InnoDB (or any other storage engine) never tries to scan a table upon which an impossible WHERE clause has been placed. Since InnoDB detects the tablespace is missing when it tries to scan the table, no error is returned if a query uses an impossible WHERE clause on an empty table. Note that LIMIT 0 is essentially the same. If LIMIT 0 is specified in a query MySQL will read no rows, but it is an easy way to copy table structure to an empty table without using CREATE TABLE LIKE if the copy of the table doesn't need the indexes, constraints, etc, of the original table.
SQL_MODE=ONLY_FULL_GROUP_BY doesn't return error when column is not in GROUP BY
Interestingly this bug is reported because ClickHouse reports an exception with GROUP BY attributes but MySQL doesn't, and the user thinks ClickHouse is right. But in reality, MySQL is smarter than ClickHouse and detects a functional dependency in the query https://dev.mysql.com/doc/refman/5.7/en/group-by-functional-dependence.html and returns correct results, and ClickHouse actually wrongly reports an error, so it is a bug in ClickHouse (or a limitation) and not a problem with MySQL at all. So again, this bug is not a bug.
Wrong results with COUNT(DISTINCT ...) over multiple expressions
I haven't analyzed this bug deeply, but it appears to be a problem with BIT columns, which are not used much in MySQL when used with COUNT(DISTINCT ...) over more than one expression (or column) something else that I don't see much utilization of, in fact, I kind of forgot that you can use COUNT(DISTINCT) in that way, and I thought I was good at SQL :)
Skip scan can return wrong results (empty set)
MySQL 8.0 can now do skip scans for some queries. Skip scans allow MySQL to utilize a multi-column index even when the prefix on the index is not specified in a query. That is, if index (a,b,c) exists on a table and there WHERE clause uses that index as a covering index, and the WHERE clause specifies b=X, then MySQL will still be able to use the index to get results faster than scanning the table. But there appears to be a row visibility problem with skip scan after DELETE is executed against the table, and the query returns empty results.
That is all for now about MySQL bugs. I will post now and then about bugs I find interesting and bugs I find while working on WARP. I hope you enjoyed. If you would like to support WarpSQL and WARP engine development, need consulting help, or just want to support me in my quest to make MySQL 8 an even better database than it already is, please consider becoming a patron.