My SQL Dump

MySQL musings by a self professed MySQL Geek

Announcing a new MySQL fork: TroySQL
swanhart
First, I believe that Oracle MySQL, Percona MySQL and SkySQL MySQL (MariaDB) are all great databases with their own merits. You should probably try all of them and see which one works best for you. Personally I like Percona Server, but all of them are good.

So why do we need ANOTHER fork when there are already three perfectly acceptable commercial forks?
a) Experimentation
b) Community Involvement
c) Serious feature/functionality divergence/disparity
d) Extensibility
e) Storage engines
f) OLAP focus

Experimentation:
There are a lot of interesting and useful things in the database world that are not possible in MySQL. Many of these features have been implemented as middleware or add-ons as third party tools, but they don't get wide adoption because they are not "built-in" to the database. Two good examples are Flexviews and Shard-Query. Other experiments and worklogs have had significant work checked into other branches and forks, but that work has never been accepted into upstream. A good example of that are "perl stored procedures", or "external stored procedures". Another is allowing plugins to add variables to the THD without modifying the source. These are good contributions that are in danger of being lost. They should not languish in a tree somewhere. I think there should be an active fork to test these ideas.

Community Involvement:
If the fork encourages experimentation, then community involvement is more likely. If the fork is more accepting of code contributions (like those with alternate licenses like BSD two and three clause) this too encourages community involvement. So both are true with TroySQL. Experimentation is encouraged, and you can contribute code under any license deemed compatible with GPL V2 (LGPLv2, BSD2,BSD3). This does not mean that every feature submitted for inclusion will make it into a release or that it will work exactly as submitted, but if you want a new cool feature like user defined types, or you want to really make foreign keys work for all engines, then this would be a great place to try it out.

Serious feature/functionality divergence/disparity:
While TroySQL does not plan to diverge so far from MySQL as to be unrecognizable, there is a high price to be paid for guaranteeing compatibility with regular MySQL. This price may be acceptable in commercial forks but I'd rather say "not guaranteed compatible" and just be done with it. So TroySQL will maintain compatibility with regular MySQL where possible, but when necessary it will diverge. Given the desire for experimentation divergence is necessary and should not be considered a negative quality.

Extensibility:
Instead of plugins with a C interface, I want to start moving to TRIGGERS. Binlog event triggers, user logon/logoff triggers, statement triggers, commit triggers, etc. This makes writing extensions in any language possible, as a trigger can be written in any language that the database supports. Many current plugins can be rewritten to use the trigger interface instead. There will still be a use for C plugins, particularly for storage engines.

Storage engines:
A column store is needed, and a compressing one. Index plugins too so that bitmap and text indexes can be implemented in an engine agnostic way. Proper triggers for row changes that can invoke C code will of course be useful too which can lead to proper FK support for all engines.

I would like to take the CONNECT storage engine, and implement proper SQL/MED data wrapper syntax on top of it instead of using it as a different storage engine.

OLAP focus:
None of the current forks plan on adding intra-query parallelism any time soon. This feature is necessary for good performance on large volumes of database when large volumes of data must be examined by queries. TroySQL will support query parallelism, starting with SELECT queries. Users have been asking for window functions and common table expressions. These will be added too, as well as table functions.

Materialized views, bitmap indexes, a column store, star join optimization and query rewrite are needed too. An optimizer that can use more than one join type and that has a hash join that can spill to disk would be nice too, as would sort/merge joins.

Release plan:
TroySQL will be released as "preview releases". Each preview release will have one or two new features. Some features might have a lot of depth, others might be simple. Each preview release will be named, and there may be multiple preview releases for a feature. So, for example, the goal of the first preview release is query rewrite, parallel query, php stored procedures and perl stored procedures. It will be called "TroySQL 5.6.X Green Swan PR1". Then, as development progresses, "Green Swan PR2" will be released. Once everything is well tested and feature complete, it will become "TroySQL 5.6.X Green Swan GA1" Finally, in a GA release, point release improvements (no new features) are labeled as "TroySQL 5.6.X Green Swan GA1.1, ... GA1.2, etc).

This will identify a feature set with a name. The next release will include everything in Green Swan and add new features. Thus, it is easy to identify when a test release is being used, and because the major MySQL release number is always in the name, it is clear which name is newer than the others.

Why Green Swan? Well, Black Swan would of course be apropos, but my favorite color is Green, and my name is Swanhart. And a Black Swan is something unexpected, thus Green Swan is even more unexpected. Or something like that :)

Where to find it:
https://github.com/greenlion/troysql-5.6/
Tags:

Working with comma separated list MySQL options
swanhart
Over time, some options have crept into the MySQL server which are comma separated lists of options. These include SQL_MODE, optimizer_switch, optimizer_trace and a few other variables.

Optimizer_switch is particularly problematic to work with as it contains many options and is hard to interpret. This is what you usually see when you examine optimizer_switch:

index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on



As you can see, seeing which option is on or off is rather difficult. You can use the REPLACE function to make this easier:
mysql> select replace(@@optimizer_switch, ',','\n')\G
*************************** 1. row ***************************
replace(@@optimizer_switch, ',','\n'): index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on
mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on
semijoin=on
loosescan=on
firstmatch=on
subquery_materialization_cost_based=on
use_index_extensions=on
1 row in set (0.00 sec)


This way you can use the \P mysql monitor option to search for key/value pairs:
mysql> \P grep mrr
PAGER set to 'grep mrr'
mysql> select replace(@@optimizer_switch, ',','\n')\G
mrr=on
mrr_cost_based=on
1 row in set (0.00 sec)

A new big data structure for streaming counters - bit length encoding
swanhart
One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits.  In MySQL DECIMAL can store more than 64 bits of data using fixed precision.  Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL?  FLOAT is lossy.  What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes advantage of the fact that counters are always increasing.  We can take advantage of the fact that the counter always increases by also choosing an arbitrary bit size in which to do math.  If this sounds confusing, the example below should be simple to understand.

Decide on a large counter (total number of bits ever downloaded from wikipedia for example).  There are two goals that have to be met if the counter is to be manageable.

  1. The data must be stored in a compact way so that it doesn't take a long time to read the counter value or write the counter value

  2. The data must be stored in such a way that a counter with a million digits does not have to print 1 million digits on the screen, but that can instead use an alternate notation.

  3. You see this with floating point operations all the time, which remove decimal places by using powers of 10.

  4. That idea can be extended to any counter using POWERS OF TWO notation without ever losing precision.

  5. The data must be stored in a way that allows further computation by the machine in the new representation.

Storing the data in the database:


  1. Define a counter.  We'll call it bits_on_the_internet.

  2. In the database use two columns.  One is an INTcolumn called COUNTER_DATA and the other a DECIMAL(60,0) column called COUNTER_DATA_OVERFLOWS.

  3. Use 64 bit math to add the size (in bits) of each document to the counter and store in memory as NEW_COUNTER_DATA.

  4. The new value stored in 64 bits may overflow 32 bits

  5. If so, then:

  6. while(NEW_COUNTER_DATA > 2^32) {


    1. Increment COUNTER_DATA_OVERFLOWS by one.

    2. Subtract 2^32 bits from the overflowed value

    3. Write the difference to COUNTER_DATA


    }

  7. If there was no overflow you don't need to do anything. Just store the new COUNTER_DATA.


Decoding the stored data:

  1. Read COUNTER_DATA

  2. Read COUNTER_DATA_OVERFLOWS

  3. Print the following on the screen: COUNTER_DATA + ((2^32) * COUNTER_DATA_OVERFLOWS))

  4. The computer can still do math on the encoded format. I just showed you how to add to the encoded format above.

  5. Using this method VERY large counters can be stored efficiently.

  6. If one DECIMAL(65) of overflows is not enough, add another one COUNTER_DATA_OVERFLOWS2 which represents the number of 2^33 bits that were taken away. In other words overflow COUNTER_DATA_OVERFLOWS into COUNTER_DATA_OVERFLOWS2.


I call this "bit-length encoding".

I also have a working non-database example written in PHP here:
https://www.facebook.com/notes/justin-swanhart/php-script-to-bit-length-encode-streaming-counters/10151445872671472

Why it works:
Algebra says that if you take something away from an equation on one side, you have to add it back on the other.  This method just keeps track of how many times a constant number has been taken away.  To un-encode the number you just need to multiply by the constant N times.  The method implemented is called "bag algebra".

The most important thing about this counter is that it is easily shows human readable numbers without really converting from machine representation.  That is, you can work with this "compressed" (encoded really) data directly, even adding or subtracting from it as long as you use the rules above.   You can do this without having to decompress it into as many bits as needed for the two's complement notation, so very large numbers don't need a long time to print or take a lot of space when they are transmitted over the network for aggregation.

This method easily allows the counter to be stored on more than one machine.  Just add the counters from each machine using 64 bit math, add the overflow counters, and then print the total representation, thus counter calculation is embarassingly parallel as long as the function providing the counter.


The hex datatype?
swanhart
Anders posted a blog post about adding a new JSON data type to MariaDB. This made me remember my very first MySQL User conference, well, that and the next couple conferences actually. Why? Because at every conference (for awhile) I remember Brian Aker getting everybody into a room to discuss how to add a data type to MySQL. It was considered pretty much the quintessential example of how to start developing for MySQL because it touches a lot of the basic guts of the server.

The example that Brian always gave was a "hex" data type, so that you can store MD5 numbers using 128 bits of storage instead of 256. A human readable MD5 is a 256 bit string (32 characters) representing 128 bits of actual data. So storing MD5 values in the database (a very common thing) requires twice as much space as it should.

Now, the question I've always asked is, "Why did nobody actually implement the type?". It seems really useful. And it was always said to be trivial. So why doesn't it exist?

Perhaps because you can use a BINARY(16) column and store the raw value, converting between hex and binary? Or maybe just use a UDF for storage and retrieval?

Should a hex data type get added after all this time? How about a UUID data type? Especially with UUID being common, it could save a lot of space.

Archive vs Percona XtraDB vs Tokutek TokuDB LOAD DATA performance
swanhart
Stewart blogged about the archive storage engine and he asked the Internet to test how fast ARCHIVE is compared with other MySQL storage engines like Percona XtraDB and Tokutek TokuDB. Since I've been curious about this subject as well, I decided to give it a quick test.
Read more...Collapse )

Not very compressible data


I took a star schema benchmark "lineorder" file and grabbed the first 15M rows. To each row I added a TEXT field called "extra_data". This field contains 10 random words from /usr/share/dict/words separated by space. This adds up to just about 3GB of raw input data.

Insert performance with no indexes (best to worst)


TokuDB: 187K rows/sec
Query OK, 15000000 rows affected (1 min 20.25 sec)


XtraDB (uncompressed): 119K rows/sec:
Query OK, 15000000 rows affected (2 min 5.40 sec)


Archive: 61K rows/sec:
Query OK, 15000000 rows affected (4 min 3.15 sec)


XtraDB Compressed key_block_size=8: 6K row/sec:
I cancelled after 6 mins, that is 2.2M rows of 15M rows.
---TRANSACTION 514, ACTIVE 365 sec inserting
1 lock struct(s), heap size 376, 0 row lock(s), undo log entries 2233162


Test with indexes


I added the following keys to the table:
alter table insert_test add key(LO_CustKey), add key(LO_OrderDateKey),
add key(LO_SuppKey), add key(LO_PartKey);


I then loaded the table as before to see how multiple indexes affect insert performance.

TokuDB: 84K rows/sec:
Query OK, 15000000 rows affected (2 min 58.42 sec)


InnoDB: 53K rows/sec:
Query OK, 15000000 rows affected (4 min 40.05 sec)


Insert performance with more highly compressible data


This time I took the first 5M rows, but instead of just adding 10 random words, I added the words in a repeating "phrase". For example: "word1 word2 ... word10 word1 word2 ... word10 word1 ...". This generated about the same amount of data.

TokuDB (no indexes): 92K rows/sec:
Query OK, 5000000 rows affected (45.34 sec)


TokuDB (with indexes): 66K rows/sec:
Query OK, 5000000 rows affected (1 min 14.89 sec)


XtraDB (no indexes): 65K rows/sec:
Query OK, 5000000 rows affected (1 min 16.75 sec)


Archive: 43K rows/sec:
Query OK, 5000000 rows affected (1 min 55.24 sec)


XtraDB (with indexes): 36.6K rows/sec:
Query OK, 5000000 rows affected (2 min 16.23 sec)


XtraDB Compressed key_block_size=8 (no indexes): 19.25K rows/sec:
Query OK, 5000000 rows affected (4 min 20.12 sec)


XtraDB Compressed key_block_size=8 (with indexes): 7.59K rows/sec:
Query OK, 5000000 rows affected (10 min 58.43 sec)


Building index on copy of the 15M row table (best to worst)


alter table sortbuild_test  add key(LO_CustKey), add key(LO_OrderDateKey),  add key(LO_SuppKey), add key(LO_PartKey);

[edit: Fixed order to put TokuDB first]
TokuDB:
Query OK, 0 rows affected (1 min 40.55 sec)


XtraDB:
Query OK, 0 rows affected (2 min 21.93 sec)




Building index on copy of the 5M row table (InnoDB 8k only)



XtraDB Compressed key_block_size=8:
Query OK, 0 rows affected (1 min 14.05 sec)



Data size comparison (smallest disk footprint to largest)


[edit]
These numbers were calculated using INFORMATION_SCHEMA.TABLES which reports the UNCOMPRESSED size for TokuDB tables. I hope they fix this in a new release. See the comments for a look at the tokudb data directory contents.

+------------------------------+-------------------+---------+---------+----------+
| table_schema                 | table_name        | data_mb | idx_mb  | sum_size |
+------------------------------+-------------------+---------+---------+----------+
| archive_phrase               | insert_test       |  529.78 |    0.00 |   529.78 |
| archive                      | insert_test       | 1386.51 |    0.00 |  1386.51 |
| innodb_compressed_phrase     | insert_test_8k    | 1601.00 |    0.00 |  1601.00 |
| innodb_compressed_phrase_idx | sortbuild_test_8k | 1601.00 |  151.30 |  1752.30 |
| innodb_compressed_phrase_idx | insert_test_8k    | 1601.00 |  433.83 |  2034.83 |
| tokudb                       | insert_test       |    X    |    X    |     X    |
| tokudb_phrase                | insert_test       |    X    |    X    |     X    |
| tokudb_phrase_idx            | insert_test       |    X    |    X    |     X    |
| innodb                       | insert_test       | 3112.00 |    0.00 |  3112.00 |
| innodb_phrase                | insert_test       | 3202.00 |    0.00 |  3202.00 |
| tokudb_idx                   | insert_test       |    X    |    X    |     X    |
| tokudb_idx                   | sortbuild_test    |    X    |    X    |     X    |
| innodb_phrase_idx            | insert_test       | 3202.00 |  565.03 |  3767.03 |
| innodb_idx                   | sortbuild_test    | 3112.00 |  899.19 |  4011.19 |
| innodb_idx                   | insert_test       | 3112.00 | 1519.00 |  4631.00 |
+------------------------------+-------------------+---------+---------+----------+


[edit]
The TokuDB test table is 1800MB (15M row table, 3G orig size)
The TokuDB test2 table is 2171MB (keys built with ALTER)
The tokudb_idx database is 2202MB (keys built with INSERT)
The tokudb_phrase database is 681MB (5M row, 3G orig size)
The tokudb_phrase_idx database is 806MB (5M row, keys built with INSERT)
[/edit]

Here is the CREATE TABLE for the data:
CREATE TABLE IF NOT EXISTS insert_test
(
LO_OrderKey bigint not null,
LO_LineNumber tinyint not null,
LO_CustKey int not null,
LO_PartKey int not null,
LO_SuppKey int not null,
LO_OrderDateKey int not null,
LO_OrderPriority varchar(15),
LO_ShipPriority char(1),
LO_Quantity tinyint,
LO_ExtendedPrice decimal,
LO_OrdTotalPrice decimal,
LO_Discount decimal,
LO_Revenue decimal,
LO_SupplyCost decimal,
LO_Tax tinyint,
LO_CommitDateKey int not null,
LO_ShipMode varchar(10),
extra_data TEXT
);

For compression:
ROW_FORMAT=compressed, KEY_BLOCK_SIZE=8


Here is the my.cnf for Percona XtraDB (not tuning for tokudb):
[mysqld]
#datadir=/data/datadirs/tokudb_55
datadir=/data/datadirs/percona_55
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

innodb_buffer_pool_size=16G
innodb_log_file_size=4G
innodb_flush_neighbor_pages=cont
innodb_fast_checksum
innodb_file_per_table
innodb_file_format=barracuda
innodb_buffer_pool_instances=6
innodb_write_io_threads=12
innodb_read_io_threads=12
innodb_flush_method=O_DIRECT
innodb_io_capacity=10000

read_buffer_size=2M
key_buffer_size=32M

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid


[edit]
Test Hardware:
Intel i970-3.2GHz 6 core w/HT (12 threads)
24GB RAM
LSI 9211-8i IT HBA
DB on software RAID 0 - Two Intel 520 SSD, one OCZ vertex.  300GB total.

Some LSI 9211-8i issues on Windows and Linux
swanhart
tl;dr:
Make sure you flash an LSI-9211 to IT firmware rev#14 to get it to work 
with Linux and SSD trim.  You may have to downgrade from newer firmware
to older firmware to get the card to work.

Read more...Collapse )

Finding a SATA III controller with more than one PCI-e lane


After a recent hardware issue I decided to upgrade my computer to use new Intel 520 120MB SSD drives in RAID for improved performance.  The motherboard I use (an ASUS Rampage III extreme) has a Marvel SATA III controller with two ports, but I discovered that it is connected via only a single PCI-e lane (each lane can do at most 400MB/sec*).  This means that it can't effectively support even a single Intel 520 because one device can saturate the SATA III bus (An Intel 520 is rated at up to 550MB/sec sequential write).

So I went on a quest for a new SATA 3 controller.   To Frys! I exclaimed.  But unfortunately, all the PCI-e 2.x SATA III controllers used a single lane!  (These cards also feature RAID which is a double wtf).  

Well, having been thwarted by my Frys run (I could have upgraded the motherboard, but I want to wait for the next generation of processors with DDR4 ram) I went to Central Computers in Mountain View.  There I found LSI PCI-e HBAs. LSI HBAs use 8 PCI-e lanes.  I decided to get an LSI 9211-8i for a little less than they sell on NewEgg.  This is a RAID controller, and I specifically told the technician that I wanted a RAID controller that supported TRIM and he said LSI supports trim**.  So I laid down my money and headed home.

Strange Windows 7 Professional 64 bit performance


I booted into Windows 7 and performance on AS-SSD (a simple benchmark utility targeted at SSD performance) for the drive was horrible.  So horrible the test could not complete. During the test the performance is reported to be pegged at 98 IOPS.  To compare, the same test, same drive, same OS, but different card: 13000 IOPS.  13K IOPS is low for the drive but it was an old sata 2 card.  Regardless, it shows the drive can do way better than the 98 IOPS that the LSI card reported while attempting to complete the test.  I messed around with this for quite awhile, pulling out hair.  I eventually tried upgrading the card to R16 firmware.  Same issue.  I figured that maybe there was some sort of windows weirdness so I decided to test Linux. I have contacted LSI about the issue with AS-SSD to see if there is possibly a driver issue to blame.

Linux and the case of the mysterious hanging mkfs.ext4


I installed CentOS 6.4 minimal on another 32GB OCZ SSD I had lying around.  CentOS found the LSI card just fine.  I could run a `dd` on the Intel 520 drives and they could do 500MB/sec+ sequential write.  Great.  Next I tried to format a filesystem on the device using `mkfs.ext4`.   ext4 now supports TRIM and will discard all of the blocks on a target when it creates a new filesystem. However, when I tried to create a filesystem with `mkfs.ext4`, it would simply hang at 0/X bytes discarded. Eventually the kernel started printing messages like "attempting task abort!" and "task abort: SUCCESS scmd" into the log.  

There was no way to kill `mkfs.ext4`.

I did some searching and found an old post to the OCZ forum about problems with an OCZ Vertex 3, which is sandforce based, same as the Intel 520.  When I was doing my tests, my Vertex 3 was on another SATA bus, isolated from the LSI so I think this may be SANDFORCE related.

If you didn't read the post, that user upgraded to firmware R14 to fix the problem. Unfortunately I had already upgraded to R16.  

Fixing the mkfs.ext4 issue


I decided to try to downflash to R14.  But of course, the flash utility won't let you downgrade by default.  However, I found a workaround by using a VERY DANGEROUS mode on the controller: RESET MODE. In order to use this function you must use the DOS flash method. I created a USB FreeDOS boot disk. The flash method may work from UEFI but UEFI is a PITA. It doesn't work from the Linux flash utility or the Windows flash utility.

To place the card in RESET MODE:
DO NOT REBOOT WITH THE CARD IN THIS STATE OR YOU WILL HAVE TO RMA THE CARD!!!!!
C:\sas2flsh.exe -o -e 6

Now you must flash the new firmware AND bios:
C:\>sas2flsh.exe -o -f FIRMWARE.bin -b BIOS.rom 

It is safe to reboot after flashing


With the LSI card flashed down to R14 I was able to create a filesystem on the device, but TRIM was not working.  /sys/block/sda/queue/discard_max_bytes was 0.

It turns out only LSI HBA initiator devices support trim.  So you can't use RAID and TRIM on this card or even TRIM on non-RAID devices (JBOD).  Sales guy sold me another lemon.  Oh well, software raid 0 isn't a large overhead.

Getting rid of raid (flash to IT mode)


Oh bugger, sas2flsh won't let you switch from IR (RAID) to IT (INITIATOR) firmware either without going into RESET mode.  Follow the same instructions above, but RESET the card then flash the IT firmware instead of IR.  The BIOS is the same in either case.

Final setup


I'm doing software RAID1 over a portion of the two Intel, swap on a portion of the OCZ, and software RAID0 over the remainder of all three.  This gives me a safe area for my OS and important files, a large high performance database test area, and the swap on the OCZ will be regarded as unused by TRIM.  This will improve garbage collection on the OCZ which was in use for some time before I bought the 520s.


* You will see it quoted as 5GT/s and/or 5Gbit/sec but that is before 8b/10b encoding is applied for the data transfer over the bus
http://en.wikipedia.org/wiki/8b/10b_encoding

** Turns out they only support trim for HBA Initiators, not HBA RAID. You can flash to IT firmware though. See the post.

*** This post isn't MySQL specific.

Talks I would recommend at the Percona Live MySQL Conference and Expo
swanhart
Time for some shameful self promotion and hopefully some helpful information too.

What follows are my lists of talks  at the MySQL conference that I would suggest to others, based on experience level, and/or interests.  I've focused on my interest areas, as others have posted lists with their interest areas already, but I also provide a list of those who may be new to the event and want suggestions for what I think will be great all-around talks with a wide focus.
Read more...Collapse )

If you are new to MySQL, or new to PCLME:
Day 1:

Day2:
Day3:
If you are experienced with MySQL and want to highlight interesting and new stuff:
Day1:
Day2:

Day 3:

If you are into sharding/scaling out:

If you are into high performance OLAP/analytics processing and/or "big data":

Transparent encryption does not make your database secure
swanhart
Transparently encrypted storage of *any* kind (storage engine based data encryption, truecrypt volume encryption, bitkeeper, etc) is *just as insecure* to most types of attack as non-encrypted data.  SQL injection or security escalation vulnerabilities, operating system vulnerabilities and cross site scripting attacks could give attackers access to the database data.  It doesn't matter if you encrypt the database's physical storage in the database itself (in the storage engine layer) or on disk (at the filesystem level) since either way the data is presented unencrypted through the SQL interface. 

Transparent encryption is great for protecting your laptop data from theft by stealing your laptop.  It is very unlikely someone will attack your server by stealing it.

It doesn't protect you from a malicious SQL injection which drops all your tables or reads all your data.

If you are worried about someone physically stealing your data, physical encryption can help with that, but it is very likely that you only need to encrypt off site backups or other data you ship to third parties.

Also, there is no functional difference between an encrypted cloud block device (eg, truecrypt encrypted EBS volume) and transparently encrypting data and storing it directly as S3 data.  

So, don't trust transparent encryption to "secure" your MySQL data.  You have to have a full set of security practices in place to ensure your entire stack is as safe as possible from attack.  Also, always back up your data.  Just because your data is stored in S3, it doesn't mean you won't accidentally delete a wrong row, drop a wrong table or get attacked by a vulnerability.

Flexviews 1.8.0 beta 1 released
swanhart
You can get Flexviews at the project page:
Flexviews project page

This release includes the following new features:

Table change logs now include a new column: fv$gsn. This column carries the "global sequence number" which uniquely identifies the order in which to make changes in the database. This was required to fix some problems identified in updating views that did not contain aggregate functions.

Updated refresh algorithm takes GSN into account and applies changes in GSN order.

All reported bugs have been fixed.

FlexCDC now supports all MySQL data types properly.
Tags:

Using the MySQL event scheduler to rotate the slow query log
swanhart
If you are like me, then you like to record all of your query traffic into the slow query log with long_query_time=0. Since I'm usually using Percona Server, I usually set log_slow_verbosity="microtime,query_plan,innodb,profiling' to get as much data as possible into the slow query log for analysis with pt-query-digest.

Generating a lot of data in the slow query log makes it more difficult to work with as it grows larger. It is important to rotate the log regularly to ensure it does not grow too large to work with. This can be accomplished with a cron job, but it can be done in MySQL directly using the event scheduler.

SET GLOBAL event_scheduler=ON;
DROP EVENT if exists rotate_slow_log_event;
CREATE DEFINER=root@localhost EVENT rotate_slow_log_event
ON SCHEDULE
EVERY 1 HOUR
ON COMPLETION PRESERVE
ENABLE
COMMENT 'Rotate slow query log hourly'
DO
set global slow_query_log_file=CONCAT('slow-', unix_timestamp(), '.log');

You are viewing swanhart