My SQL Dump

MySQL musings by a self professed MySQL Geek


  • 1

Have you tried other TokuDB compression types?

Justin, good stuff. One small detail I think you have the order reversed in "Building index on copy of the 15M row table (best to worst)", as TokuDB was faster. Also, you should try the lzma and zlib compression options of TokuDB, I assume this analysis was just for quicklz.

Order fixed.

Yes, I tested default settings with no tuning, so if it defaults to quicklz then that is what it used. What option do I set to get the other compression types? I'll read the manual soon, I promise :)

TokuDB compression types

Justin,

It's simple enough to use a particular compression algorithm for a table and it's secondary indexes. Consider the following create table:

create table foo
(a bigint not null auto_increment primary key,
b bigint not null,
c varchar(20) not null)
engine=tokudb row_format=tokudb_quicklz;

valid row formats are:
tokudb_quicklz
tokudb_lzma
tokudb_zlib
tokudb_uncompressed

if you don't include row_format=, you'll get the default (which comes from the server/session variable tokudb_row_format). And the default default is quicklz.

-Tim

how did you measure compressed tokudb size?

Did you look at the actual data size (in the directory). As I understand it, TokuDB reports uncompressed size in show table status, which I'll admit is confusing.
-Bradley

Re: how did you measure compressed tokudb size?

(Anonymous)
Very good question, a common use case for ARCHIVE is not speed of insert but rather on disk footprint/savings. In these results, innodb and tokudb are clear losers on that front. It remains to be seen if this caveat is the culprit.. I was surprised to see that tokudb was only marginally smaller than uncompressed innodb.

Re: how did you measure compressed tokudb size?

I'll test with various compression options and report proper sizes now that I know the size reported in I_S is wrong (see edit above and comments with sizes).

It looks like TokuDB was 1.8GB after compression with no indexes with quicklz.

The TokuDB test table is 1800MB (15M row table, 3G orig size)
The TokuDB test2 table is 2171MB (keys built with ALTER)
The tokudb_idx database is 2202MB (keys built with INSERT)
The tokudb_phrase database is 681MB (5M row, 3G orig size)
The tokudb_phrase_idx database is 806MB (5M row, keys built with INSERT)

Re: how did you measure compressed tokudb size?

[root@workstation tokudb_55]# ls -lh
total 7.5G
-rw-rw----. 1 mysql mysql 18M May 14 01:59 ibdata1
-rw-rw----. 1 mysql mysql 5.0M May 14 02:56 ib_logfile0
-rw-rw----. 1 mysql mysql 5.0M May 13 23:28 ib_logfile1
-rwx------. 1 mysql mysql 806K May 14 08:57 log000000000036.tokulog24
drwx------. 2 mysql root 4.0K May 13 23:28 mysql
drwx------. 2 mysql mysql 4.0K May 13 23:28 performance_schema
-rw-rw----. 1 mysql mysql 24K May 14 02:56 tc.log
drwx------. 2 mysql root 4.0K May 13 23:28 test
drwx------. 2 mysql mysql 4.0K May 14 00:55 tokudb
-rwxrwx--x. 1 mysql mysql 32K May 14 01:43 tokudb.directory
-rwxrwx--x. 1 mysql mysql 16K May 13 23:30 tokudb.environment
drwx------. 2 mysql mysql 4.0K May 14 01:45 tokudb_idx
-rwxrwx--x. 1 mysql mysql 73M May 14 01:44 _tokudb_idx_sortbuild_test_key_LO_CustKey_75_9_18_B_0.tokudb
-rwxrwx--x. 1 mysql mysql 66M May 14 01:44 _tokudb_idx_sortbuild_test_key_LO_OrderDateKey_75_9_18_B_1.tokudb
-rwxrwx--x. 1 mysql mysql 103M May 14 01:45 _tokudb_idx_sortbuild_test_key_LO_PartKey_75_9_18_B_3.tokudb
-rwxrwx--x. 1 mysql mysql 86M May 14 01:45 _tokudb_idx_sortbuild_test_key_LO_SuppKey_75_9_18_B_2.tokudb
-rwxrwx--x. 1 mysql mysql 1.8G May 14 01:42 _tokudb_idx_sortbuild_test_main_71_2_18.tokudb
-rwxrwx--x. 1 mysql mysql 64K May 14 01:45 _tokudb_idx_sortbuild_test_status_71_1_18.tokudb
-rwxrwx--x. 1 mysql mysql 85M May 13 23:41 _tokudb_insert_test2_key_LO_CustKey_1e_1_18_B_0.tokudb
-rwxrwx--x. 1 mysql mysql 81M May 13 23:41 _tokudb_insert_test2_key_LO_OrderDateKey_1e_1_18_B_1.tokudb
-rwxrwx--x. 1 mysql mysql 108M May 13 23:41 _tokudb_insert_test2_key_LO_PartKey_1e_1_18_B_3.tokudb
-rwxrwx--x. 1 mysql mysql 97M May 13 23:41 _tokudb_insert_test2_key_LO_SuppKey_1e_1_18_B_2.tokudb
-rwxrwx--x. 1 mysql mysql 1.8G May 13 23:41 _tokudb_insert_test2_main_1e_1_18_B_4.tokudb
-rwxrwx--x. 1 mysql mysql 64K May 13 23:38 _tokudb_insert_test2_status_11_1_18.tokudb
-rwxrwx--x. 1 mysql mysql 1.8G May 13 23:39 _tokudb_insert_test_main_e_1_18_B_0.tokudb
-rwxrwx--x. 1 mysql mysql 16K May 13 23:32 _tokudb_insert_test_status_9_1_18.tokudb
-rw-------. 1 mysql mysql 0 May 13 23:30 __tokudb_lock_dont_delete_me_data
-rw-------. 1 mysql mysql 0 May 13 23:30 __tokudb_lock_dont_delete_me_environment
-rw-------. 1 mysql mysql 0 May 13 23:30 __tokudb_lock_dont_delete_me_logs
-rw-------. 1 mysql mysql 0 May 13 23:30 __tokudb_lock_dont_delete_me_temp
-rwxrwx--x. 1 mysql mysql 32K May 14 01:39 tokudb_meta_3_0_18.tokudb
drwx------. 2 mysql mysql 4.0K May 14 00:55 tokudb_phrase
drwx------. 2 mysql mysql 4.0K May 14 00:58 tokudb_phrase_idx
-rwxrwx--x. 1 mysql mysql 29M May 14 00:59 _tokudb_phrase_idx_insert_test_key_LO_CustKey_5d_1_18_B_0.tokudb
-rwxrwx--x. 1 mysql mysql 27M May 14 00:59 _tokudb_phrase_idx_insert_test_key_LO_OrderDateKey_5d_1_18_B_1.tokudb
-rwxrwx--x. 1 mysql mysql 36M May 14 00:59 _tokudb_phrase_idx_insert_test_key_LO_PartKey_5d_1_18_B_3.tokudb
-rwxrwx--x. 1 mysql mysql 33M May 14 00:59 _tokudb_phrase_idx_insert_test_key_LO_SuppKey_5d_1_18_B_2.tokudb
-rwxrwx--x. 1 mysql mysql 681M May 14 00:59 _tokudb_phrase_idx_insert_test_main_5d_1_18_B_4.tokudb
-rwxrwx--x. 1 mysql mysql 64K May 14 00:58 _tokudb_phrase_idx_insert_test_status_52_1_18.tokudb
-rwxrwx--x. 1 mysql mysql 681M May 14 00:56 _tokudb_phrase_insert_test_main_4f_1_18_B_0.tokudb
-rwxrwx--x. 1 mysql mysql 16K May 14 00:55 _tokudb_phrase_insert_test_status_4a_1_18.tokudb
-rw-------. 1 mysql mysql 0 May 13 23:30 __tokudb_recoverylock_dont_delete_me
-rwxrwx--x. 1 mysql mysql 32M May 14 01:45 tokudb.rollback
-rw-rw----. 1 mysql mysql 6 May 14 02:56 workstation.pid

Re: how did you measure compressed tokudb size?

This, of course, will likely lead to an echo of Mark Callaghan's complaint that TokuDB keeps all its files in the top level data directory.

We'll fix that eventually.
-Bradley

Re: how did you measure compressed tokudb size?

I don't complain for complaining sake.

Performance may suffer when there are many files in a single directory, such as when you have many tables on a single server.

Re: how did you measure compressed tokudb size?

I haven't actually seen this kind of performance problem, even with tens of thousands of tables. Generally ext4 and xfs both seem to achieve good performance with large numbers of files ina single directory.

Re: how did you measure compressed tokudb size?

a) There are still people who use ext3
b) ext4 is still slower than XFS at dropping files and slower when there are many files in the directory
c) XFS still takes longer to drop files in a directory with many files compared to one with fewer
d) This isn't an OLTP test, but microstalls are very bad in an OLTP environment because they lead to pileups. Dropping a table has to drop one file per index in your case, which may lead to stalls if you hold some global lock while you drop a table. Though I don't know if you do.

Re: how did you measure compressed tokudb size?

I also want to be able to do `du -sh *` to figure out how much space each DB is using on disk. I don't want to write a script to parse the files in the datadir.

Re: how did you measure compressed tokudb size?

It looks like:
 The TokuDB test table is 1800MB.
 The TokuDB test2 table is 2171MB
 The tokudb_idx database is 2202MB
 The tokudb_phrase database is 681MB
 The tokudb_phrase_idx database is 806MB.

Re: how did you measure compressed tokudb size?

Hmm. I wonder if there is a parameter that will make it report correct sizes?

Re: how did you measure compressed tokudb size?

I don't think so.

read performance and compression?

(Anonymous)
Since Archive engine is used for "write once read many times" pattern IMO read performance and compression are most important.

When testing for TokuDB aggressive compression

Justin,

If adding tests for TokuDB's aggressive compression, please note that merely doing

ALTER TABLE insert_test engine=tokudb row_format=tokudb_quicklz;

May NOT actually reduce footprint on disk; at least this is how I get it. Either modify to InnoDB then back to TokuDB or create table from scratch.


It would have been better if versions of each of the engines (or
their parent packages) would have been reported.

Recently I saw on tokudb's list about its fsync performance and
how one of the slaves was falling behind. They seem to be fixing
it with certain settings (to adjust interval) though.

Another thing is, the default zlib Innodb compression (which cannot be
changed though) is not on par with quicklz or any new ones (or cpu-intensive as the latter).

Also, it would be interesting to see how MySQL 5.6 fares in
these.

Mentioned MySQL 5.6 especially since there have been many improvements to InnoDB compression in it (with some of the facebook patches gone into it).

If possible, you can also build from https://github.com/facebook/mysql-5.6 to instrument all their new compression features.

Regarding compression efficiency, the table here https://code.google.com/p/lz4/ is quite illuminating.

LZ4 (r59) 2.084 330 915

v/s

zlib 1.2.5 -1 2.728 39 195

so zlib is nearly 10 times slow.


my business partner was looking for a form several days ago and encountered an online platform that hosts an online forms database . If others need to fill out it as well , here's a http://pdf.ac/8S5N1K (http://pdf.ac/8S5N1K)

  • 1
?

Log in