My SQL Dump

MySQL musings by a self professed MySQL Geek

Previous Entry Add to Memories Share Next Entry
Materialized Magic - Incrementally refreshing Materialized Views w/ MySQL 5
swanhart
One of my favorite features of the Oracle database is support for Materialized Views or 'snapshots' as they used to be called. I've often lamented that MySQL lacks this feature, but everybody I talked to seemed to feel that the feature was just "too big", "too difficult" or "frankly impossible" to implement. Well, frankly, nothing motivates me more than telling me that something is impossible.

*edit* - these restrictions do not apply to my system, this is just what I found "in the wild" when I started

So I started doing some research and came across some rudimentary stored procedure implementations for SELECT-PROJECT (single base table, no joins) views. Their first major drawback is that they require the user to hand-craft a stored procedure, which is not only tedious, but it makes maintenance a nightmare. Second they don't support joins, so their usefulness is very limited. Nonetheless, it gave me a place to start.

I started building a framework for defining views via a stored procedure API. I implemented full refresh, as well as a version of incremental refresh that supported SELECT-PROJECT views, provided that the view contained no duplicates. I also included a special refresh method that could rebuild a portion of the table, provided a materialized view log on one or more tables, and given a hand crafted refresh statement. I hate hand crafted, but if it means refreshing in 10 minutes versus 2 hours, I'll compromise. While functional, that first version is far from ideal. As with most projects, something is better than nothing so I produced a stable version I called "Phase Zero". It passed Q/A, and I immediately started working on a better version.

AdBrite has been using "Phase Zero" in production since the start of February with much success. It has helped us improve the speed of some of our lowest performing web pages by as much as 2000 percent.

What follows is a detailed description of the prototype implementation of "Phase Two". I threw away "Phase One".

Please note, it is designed to work on a replication slave, not your master, unless you have a very low volume system, in which case, you probably don't need materialized views anyway. More on all that below.

It supports:


  • SELECT-PROJECT-JOIN views. This is especially well suited for star schemas.

  • Views with duplicates.

  • Views with aggregate functions (MIN,MAX,AVG,SUM,COUNT,COUNT(*))

  • SP views which may be self-maintainable.

  • Point-in-time refresh

  • Table snapshots.

  • Asynchronous change propagation

  • and more...




Some of this work is inspired by the ACM SIGMOD paper "How to roll a join: asynchronous incremental view maintenance", which you can find at http://portal.acm.org/citation.cfm?id=335393. You must be an ACM member to read the PDF version of the paper. This implementation differs from the paper and includes support for aggregates, but the authors' recursive delta computation algorithm was immensely useful as the basis for the algorithms used here.

This project will be released to the community, but the exact way that is going to be done hasn't been determined yet. This project is still in development and isn't ready for outsiders, but it is far enough along that I wanted to share what I was working on, and to solicit feedback and suggestions on where it should go.

DEFINITIONS

VIEW:
A stored SELECT statement evaluated on demand by the database

MATERIALIZED VIEW:
Like a VIEW, but the results are stored in a table on disk and (usually) refreshed periodically.

BASE TABLE:
A table that is referenced in a [materialized] VIEW.

SP|SELECT/PROJECT VIEW:
A VIEW that accesses only one base table and includes SELECT FROM [WHERE] clauses.

SPJ|SELECT/PROJECT/JOIN VIEW:
A VIEW that JOINS more than one table. SELECT FROM JOIN [WHERE].

STORED PROCEDURE:
A program module stored in the database.

NATIVE FUNCTION:
A program module compiled into the MySQL server (requires a patch to MySQL).

FULL REFRESH:
The materialized view is completely refreshed with CREATE TABLE AS.

INCREMENTAL REFRESH:
The materialized view is UPDATED from logged changes to the base tables.

TRIGGER: A program module stored in the database that executes whenever a row in a table is modified

MATERIALIZED VIEW LOG:
A trigger maintained table that records changes to a base table. Each row in a materialized view log represents an INSERT or DELETE to the base table. UPDATES are written as a DELETE followed by an INSERT. INSERT operations write the NEW values to the log, DELETE operations write the OLD values to the log. Writes to the materialized view log are serialized, and only one transaction may update any base tables at one time. Think of this like a "group table lock". This is why this process is suited to a slave. Only one write transaction is ever in progress, so this serialization adds very little overhead.

MATERIALIZED VIEW DELTA TABLE:
A table maintained by a stored routine that records changes to a materialized view calculated from materialized view log tables.

PROPAGATION QUERY:
A query that computes changes based on the materialized view logs. A propagation query inserts into the materialized view delta table changes to the materialized view based on the base table changes for a given time period. It reads these changes from the materialized view logs and from the base tables. Propagation queries in turn generate compensation queries that essentially 'unapply' changes to the materialized view delta table to compensate for the time differentials involved. This is what makes asynchronous propagation of changes possible.

REFRESH QUERY:
A query that reads from a materialized view delta table and applies the changes to the materialized view. Currently propagation and refresh are tied serially in mview.refresh(), but this will change in the near future, it is only that way for convenience at the moment.

SCHEMA

mysql> desc mview.mview;
+----------------------+--------------------------------+------+-----+---------+
| Field                | Type                           | Null | Key | Default |       
+----------------------+--------------------------------+------+-----+---------+
| mview_id             | int(11)                        | NO   | PRI | NULL    |  
| mview_name           | varchar(50)                    | YES  | MUL | NULL    |               
| mview_schema         | varchar(50)                    | YES  |     | NULL    |                 
| mview_enabled        | tinyint(1)                     | YES  |     | NULL    |                
| mview_last_refresh   | datetime                       | YES  |     | NULL    |                
| mview_refresh_period | int(11)                        | YES  |     | 86400   |                
| mview_refresh_type   | enum('INCREMENTAL','COMPLETE') | YES  |     | NULL    |                
| mview_engine         | enum('MyISAM','InnoDB')        | YES  |     | InnoDB  |                
| mview_definition     | text                           | YES  |     | NULL    |                
| incremental_hwm      | bigint(20)                     | YES  |     | NULL    |                
| refreshed_to_uow_id  | bigint(20)                     | YES  |     | NULL    |                
+----------------------+--------------------------------+------+-----+---------+
11 rows in set (0.01 sec)

mysql> desc mview.mview_table;
+----------------------+--------------+------+-----+---------+----------------+
| Field                | Type         | Null | Key | Default | Extra          |
+----------------------+--------------+------+-----+---------+----------------+
| mview_table_id       | int(11)      | NO   | PRI | NULL    | auto_increment | 
| mview_id             | int(11)      | NO   |     | NULL    |                | 
| mview_table_name     | varchar(100) | YES  |     | NULL    |                | 
| mview_table_schema   | varchar(100) | YES  |     | NULL    |                | 
| mview_table_alias    | varchar(100) | YES  |     | NULL    |                | 
| mview_join_condition | text         | YES  |     | NULL    |                | 
| mview_join_order     | int(11)      | YES  |     | 999     |                | 
+----------------------+--------------+------+-----+---------+----------------+
7 rows in set (0.02 sec)

mysql> desc mview.mview_expression; 
+---------------------+--------------------------------------------------------------------------------+------+-----+---------+----------------+
| Field               | Type                                                                           | Null | Key | Default | Extra          |
+---------------------+--------------------------------------------------------------------------------+------+-----+---------+----------------+
| mview_expression_id | int(11)                                                                        | NO   | PRI | NULL    | auto_increment | 
| mview_id            | int(11)                                                                        | YES  | MUL | NULL    |                | 
| mview_expr_type     | enum('GROUP','SUM','AVG','COUNT','MIN','MAX','WHERE','PRIMARY','KEY','COLUMN') | YES  |     | NULL    |                | 
| mview_expression    | text                                                                           | YES  |     | NULL    |                | 
| mview_alias         | varchar(100)                                                                   | YES  |     | NULL    |                | 
| mview_expr_order    | int(11)                                                                        | YES  |     | 999     |                | 
+---------------------+--------------------------------------------------------------------------------+------+-----+---------+----------------+
6 rows in set (0.01 sec)

This is the table that enforces serialization.
mysql> desc mview.mview_uow_sequence;
+--------+------------+------+-----+---------+-------+
| Field  | Type       | Null | Key | Default | Extra |
+--------+------------+------+-----+---------+-------+
| uow_id | bigint(20) | YES  |     | NULL    |       | 
+--------+------------+------+-----+---------+-------+
1 row in set (0.01 sec)

mysql> desc mview.mview_uow;         
+-------------+------------+------+-----+---------+-------+
| Field       | Type       | Null | Key | Default | Extra |
+-------------+------------+------+-----+---------+-------+
| uow_id      | bigint(20) | YES  |     | NULL    |       | 
| commit_time | datetime   | YES  | MUL | NULL    |       | 
+-------------+------------+------+-----+---------+-------+
2 rows in set (0.01 sec)

</center>
API FUNCTIONS

+------------------------+ +------------------------+
|    User Functions      | |   Internal Functions   |
+------------------------+ +------------------------+
| add_expr               | | ensure_validity        |
| add_table              | | execute_refresh        |
| set_definition         | | execute_refresh_step   |
| create                 | | get_delta_aliases      |
| create_mvlog           | | get_delta_from         |
| disable                | | get_delta_groupby      |
| enable                 | | get_delta_least_uowid  |
| remove_expr            | | get_delta_select       |
| get_id                 | | get_delta_where        |
| refresh                | | get_insert             |
| remove_table           | | get_trigger_body       |
| rename                 | | has_aggregates         |
 | set_definition         | | mview_get_from_clause  | 
 |                        | | mview_get_keys         | 
|                        | | mview_get_select_list  |
|                        | | mview_refresh_complete |
|                        | | mview_signal           |
|                        | | uow_end                |
|                        | | uow_execute            |
|                        | | uow_from_dtime         |
|                        | | uow_start              |
|                        | | uow_state_change       |
|                        | | apply_delta            |
+------------------------+ +------------------------+


How it works (an overview)

A mysql replication slave reads the binlog from the master, and applies it in serialized commit order from the master. DML operations on base tables are logged into materialized view logs w/ triggers. These triggers first call mview.uow_state_change(), which determines if the current connection has committed and attempts to obtain a write lock on the mview.mview_uow_sequence table using mview.uow_start(). This write lock enforces the serialization of a "unit of work" involving the base tables. This ensures serialization even if there are multiple transactions in the database. Because the SQL apply thread operates serially anyway, this additional serialization protection adds little overhead. mview.uow_start() will record the current time as the effective commit time of the previous unit of work. Since serialization is enforced with the locking mechanism, no changes to the table could have happened in the intervening period, thus the database state of the tables at the end of the last unit of work is exactly the same as the current database state with respect to view base tables.


To create a materialized view, first call mview.create_mvlog() for each table you will include in a materialized view. The function will spit out some SQL for you to copy and paste to create the materialized view log table and the associated CREATE TRIGGER statements for the base table. The log will have the same name and schema as your base table, but with _mvlog postfixed. I'm going to make mvlog locations configurable per table eventually.

With logs in place some materialized views can be created. I will create a demonstration view with an equijoin of two simple tables.


mysql> CREATE TABLE test.a (c1 INT auto_increment primary key) engine=InnoDB;
Query OK, 0 rows affected (0.17 sec)

mysql> CREATE TABLE test.b (c2 INT auto_increment primary key, c1 INT references a(c1) ) engine=InnoDB;
Query OK, 0 rows affected (0.47 sec)

mysql> insert into a values (1);
Query OK, 1 row affected (0.00 sec)

mysql> insert into test.b values (1, 1),(2,2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

The materialized view that we define will be the equivalent of the following query:
mysql> SELECT a.c1, b.c2 FROM test.a JOIN test.b USING (c1);
+----+----+
| c1 | c2 |
+----+----+
|  1 |  1 | 
+----+----+
1 row in set (0.00 sec)


mysql> call mview.create_mvlog('a','test');
DELIMITER ;;
DROP TABLE IF EXISTS test.a_mvlog;; 
CREATE TABLE test.a_mvlog( dml_type INT DEFAULT 0, uow_id BIGINT, 
                           c1 int(11), KEY(uow_id) 
) ENGINE=INNODB;
[output cut]
Query OK, 0 rows affected (0.45 sec)

mysql> call mview.create_mvlog('b', 'test');
[output cut]
Query OK, 0 rows affected (0.44 sec)

mysql> call mview.create('mv', 'test', 'INCREMENTAL', 1);
Query OK, 1 row affected (0.43 sec)

mysql> set @id = LAST_INSERT_ID();
Query OK, 0 rows affected (0.00 sec)

*edit* add some comments and add USING(c1) which got lost at some point
-- add the first table to the materialized view w/ alias a_alias and no join clause
mysql> call mview.add_table(@id, 'a', 'test', 'a_alias', NULL);
Query OK, 1 row affected (0.34 sec)

-- add the second table, call it b_alias and JOIN USING(c1)
mysql> call mview.add_table(@id, 'b', 'test', 'b_alias', 'USING(c1)');
Query OK, 1 row affected (0.00 sec)

-- add the columns we are going to 'select'
mysql> call mview.add_expr(@id, 'COLUMN', 'a_alias.c1', 'a_c1');
Query OK, 1 row affected, 1 warning (0.37 sec)

mysql> call mview.add_expr(@id, 'COLUMN', 'b_alias.c2', 'b_c1');
Query OK, 1 row affected, 1 warning (0.04 sec)

-- all views require a key for incremental refresh.  if there are no duplicates you
-- may use PRIMARY.  Aggregate tables get a PRIMARY added automatically if none is provided.
mysql> call mview.add_expr(@id, 'KEY', 'a_c1, b_c1', 'key_name');
Query OK, 1 row affected (0.36 sec)

mysql> call mview.enable(@id);
Query OK, 0 rows affected, 2 warnings (0.53 sec)

Ensure the output is what we expect:

mysql> select * from test.mv;
+------+------+
| a_c1 | b_c1 |
+------+------+
|    1 |    1 | 
+------+------+
1 row in set (0.00 sec)

Now lets make some changes to the tables:

mysql> insert into a values (2),(3);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into b values(3,3); 
Query OK, 1 row affected (0.06 sec)

mysql> delete from b where c1=1;
Query OK, 1 row affected (0.01 sec)

What should the mv contain now:
mysql> SELECT a.c1, b.c2   FROM test.a JOIN test.b USING (c1);
+----+----+
| c1 | c2 |
+----+----+
|  2 |  2 | 
|  3 |  3 | 
+----+----+
2 rows in set (0.00 sec)

mv is now out of date (this is expected):

mysql> select * from test.mv;
+------+------+
| a_c1 | b_c1 |
+------+------+
|    1 |    1 | 
+------+------+
1 row in set (0.00 sec)

lets update it:

mysql> call mview.refresh(mview.get_id('mv','test'));
Query OK, 0 rows affected, 3 warnings (1.24 sec)

mysql> select * from test.mv;
+------+------+
| a_c1 | b_c1 |
+------+------+
|    2 |    2 | 
|    3 |    3 | 
+------+------+
2 rows in set (0.00 sec)


Now, I know that looks rather unimpressive, because it is a pretty simple example, but let me explain what happened under the hood:
First, here are the logged changes.

mysql> select * from a_mvlog;
+----------+--------+------+
| dml_type | uow_id | c1   |
+----------+--------+------+
|        1 |   6475 |    2 | 
|        1 |   6475 |    3 | 
+----------+--------+------+
2 rows in set (0.00 sec)

mysql> select * from b_mvlog;
+----------+--------+------+------+
| dml_type | uow_id | c2   | c1   |
+----------+--------+------+------+
|        1 |   6475 |    3 |    3 | 
|       -1 |   6475 |    1 |    1 | 
+----------+--------+------+------+
2 rows in set (0.00 sec)

With that in mind, here is a description of what happens to refresh the view. After that, I'll paste the actual SQL that was executed by the stored procedure so you can analyze how it works more closely.

When mview.refresh() is called, the materialize view delta calculation algorithm will recursively read from a combination of base tables and logs to create view delta records which are inserted into the materialize view's delta table. If the materialized view involves aggregates, then the delta table contains delta values that are applied to the table. If the materialized view does not include grouping, each row in the delta table is a row insert or delete. As special care is taken to support removing only the necessary number of rows in the presence of duplicates from the materialized view, this makes table snapshotting possible.

This portion of the implementation was a challenge because MySQL stored procedures do not support arrays. I had to use a temporary table which makes calculating the SQL to calculate the deltas more expensive than actually running that SQL, especially in a simple situation like this example!

On MySQL 5.1, the commit/rollback detection(mview.uow_state_change()) can make use of the INFORMATION_SCHEMA to detect when com_commit or com_rollback increments. This however, is far too slow to use in a trigger, as each access to the SESSION_STATUS table takes almost .5 seconds, sometimes more. On MySQL 5.0 (or if you want to do something other than limited testing of the functionality on 5.1) two custom native functions are required: get_commits() and get_rollbacks(). They return the com_commit and com_rollback session status counters, respectively. As I understand it, a UDF doesn't get access to a THD pointer, so it has to be a native function. There are lots of reasons to compile your own MySQL server, including a slew of useful patches from Google, so I encourage you to do so regardless.

The delta apply process (also called refresh process) reads from the delta table and applies the changes to the materialized view. In a table absent aggregate functions, the INSERT rows are inserted in one batch, followed by a DELETE for each deleted record. This was a challenge because DELETE w/ a JOIN doesn't support LIMIT.

In the case of aggregates a DELETE is executed that removes rows where MVIEW.COUNT(*) + DELTA.COUNT(*) = 0 from both the delta and materialized view tables. Remember, that the delta table records delta values, so deletes result in negative COUNT(*) values. Following the delete an INSERT ... ON DUPLICATE KEY UPDATE is executed on the materialized view with the remaining rows left to apply in the delta table.

A COUNT(*) expression is required for tables w/ aggregation. COUNT(*) AS "CNT" is added automatically to the view definition if the user neglected to add a COUNT(*) manually. A user defined COUNT(*) column does not have to use the CNT alias. This automatic addition happens in mview.ensure_validity(), which is called by mview.enable() to ensure that no syntax error causing expressions were entered by the user and to do any final checks like this one.

Below is the "under the hood SQL" that makes it work. If @DEBUG=1 the refresh process logs to a temporary table called mview.refresh_log. It currently also uses Xaprb's great now_usec UDF to get the timestamp w/ microseconds. As you can see, the MySQL stored procedure language isn't all that fast. This is going to need quite a bit of optimization, perhaps moving some functionality into native functions.

mysql> select * from mview.refresh_log order by tstamp, usec\G
*************************** 1. row ***************************
 tstamp: 2008-02-26 02:16:27
   usec: 934949
message: -- START PROPAGATE
CALL mview.execute_refresh(13,6474,6475,1);
*************************** 2. row ***************************
 tstamp: 2008-02-26 02:16:27
   usec: 960367
message: NULL
*************************** 3. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 47674
message: 
INSERT INTO test.mv_delta 
SELECT (a_alias.dml_type * 1) as dml_type,
       a_alias.uow_id as uow_id, 
       (a_alias.c1) as `a_c1`, 
       (b_alias.c2) as `b_c1`
  FROM test.a_mvlog as a_alias   
  JOIN test.b as b_alias USING(c1)
 WHERE a_alias.uow_id >6474 
   AND a_alias.uow_id <=6475
*************************** 4. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 100077
message: CALL mview.execute_refresh(13, NULL, 6476,-1);
*************************** 5. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 111012
message: NULL
*************************** 6. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 203315
message: 
INSERT INTO test.mv_delta 
SELECT (b_alias.dml_type * -1) as dml_type,
       LEAST(a_alias.uow_id,b_alias.uow_id) as uow_id, 
       (a_alias.c1) as `a_c1`,
       (b_alias.c2) as `b_c1`
  FROM test.a_mvlog as a_alias   
  JOIN test.b_mvlog as b_alias USING(c1)
 WHERE a_alias.uow_id >6474 
   AND a_alias.uow_id <=6475 
   AND b_alias.uow_id >6475 
   AND b_alias.uow_id <=6476
*************************** 7. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 238112
message: CALL mview.execute_refresh_step(13,1,1,16,6476);
*************************** 8. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 300543
message: 
INSERT INTO test.mv_delta 
SELECT (b_alias.dml_type * 1) as dml_type,
       b_alias.uow_id as uow_id, 
       (a_alias.c1) as `a_c1`, 
       (b_alias.c2) as `b_c1`
  FROM test.a as a_alias   
  JOIN test.b_mvlog as b_alias USING(c1)
 WHERE b_alias.uow_id >6474 
   AND b_alias.uow_id <=6475
*************************** 9. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 346809
message: CALL mview.execute_refresh(13, NULL, 6478,-1);
*************************** 10. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 356822
message: NULL
*************************** 11. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 450155
message: 
INSERT INTO test.mv_delta 
SELECT (a_alias.dml_type * -1) as dml_type,
       LEAST(a_alias.uow_id,b_alias.uow_id) as uow_id, 
       (a_alias.c1) as `a_c1`, 
       (b_alias.c2) as `b_c1`
  FROM test.a_mvlog as a_alias   
  JOIN test.b_mvlog as b_alias USING(c1)
 WHERE a_alias.uow_id >6474 
   AND a_alias.uow_id <=6478 
   AND b_alias.uow_id >6474 
   AND b_alias.uow_id <=6475
*************************** 12. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 484343
message: -- END PROPAGATE
*************************** 13. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 485530
message: -- START APPLY
CALL mview.apply_delta(13,6475);
*************************** 14. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 489657
message:  -- first the inserts
*************************** 15. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 511148
message: 
INSERT INTO test.mv 
SELECT a_c1, b_c1   
  FROM test.mv_delta 
 WHERE dml_type = 1 
   AND uow_id > 6474   
   AND uow_id <= 6475
*************************** 16. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 515309
message: DROP TEMPORARY TABLE IF EXISTS deletes;
*************************** 17. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 517121
message: 
CREATE TEMPORARY TABLE deletes ( dkey int auto_increment primary key ) 
AS 
SELECT * 
  FROM test.mv_delta 
 WHERE dml_type = -1 AND uow_id > 6474 AND uow_id <= 6475
*************************** 18. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 542495
message: -- Entering delete loop: 2 rows to delete
*************************** 19. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 577724
message: 
 DELETE test.mv.* 
   FROM test.mv 
NATURAL JOIN deletes
  WHERE deletes.dkey = 1 
    AND CONCAT(deletes.a_c1, deletes.b_c1,1) =  
        CONCAT(test.mv.a_c1, test.mv.b_c1,@oneRow := @oneRow + 1 )

*************************** 20. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 617034
message: 
 DELETE test.mv.* 
   FROM test.mv 
NATURAL JOIN deletes
  WHERE deletes.dkey = 2 
    AND CONCAT(deletes.a_c1, deletes.b_c1,1) =  
        CONCAT(test.mv.a_c1, test.mv.b_c1,@oneRow := @oneRow + 1 )

*************************** 21. row ***************************
 tstamp: 2008-02-26 02:16:28
   usec: 626631
message: 
UPDATE mview.mview 
   SET refreshed_to_uow_id = v_until_uow_id
 WHERE mview_id =13;
21 rows in set (0.00 sec)



So that is it for now. Let me know what you think, what suggestions you have, whatever. And as always, enjoy using MySQL!


ummm....yeah, what you said

monkeyarcher

2008-02-26 02:49 pm (UTC)

dude...I have been so into my own crap lately that I have let some very important things slip....I am very happy to see you posting again. Any chance on some sort of update on what is going on with you?

e.t.a.you have an icon of you! How cute!

Edited at 2008-02-26 02:50 pm (UTC)

Re: ummm....yeah, what you said

swanhart

2008-02-26 07:36 pm (UTC)

Good to hear from you too. Not much has been going on actually.

I started working on this during my Christmas holiday, and it is pretty much all I've done during waking hours since.

I like your user pic too. Great shot.

Nice idea

(Anonymous)

2008-02-26 09:52 pm (UTC)

Given the amount of websites, especially forums (running on php) that could benefit from having materialized views this is a big step.
If you were to replicate the mview db back to the master, you get to have your cake and eat it too.

You could potentially run the propagate phase on a MySQL slave and instead of maintaining the delta table on the slave, instead maintain it via a Federated table that points to the master. You could then run the low-impact apply phase on the master. Since the delta table would be maintained by regular DML, that would replicate down to your slaves and they would all apply the changes properly to their copies of the view through regular MySQL replication.

Also, because incremental refresh maintains the views with regular DML statements, a materialized view log can be added to a materialized view itself.

This means that in a horizontally partitioned cluster, aggregated materialized views can be maintained on each shard. Another set of machines can then collect these changes and UNION them to maintain a centralized materialized view. This can help provide a solution to the "aggregate" problem in a horizontally partitioned cluster, that is: How to resolve queries involving aggregation over more than one shard?

In fact, this is a special case of multi-master replication. Given a well-defined conflict resolution protocol, table snapshotting can be used to implement almost any arbitrary table change based replication system including general multi-master replication.

You could even go crazy and implement block level replication for changes made by triggers and stored procedures (on MySQL 5.0, 5.1 already has this feature) with some elbow grease.

Edited at 2008-02-27 01:03 am (UTC)

Location of paper

(Anonymous)

2008-02-26 09:57 pm (UTC)

The paper can be found, with a bunch of other interesting stuff at Kevin Beyer's webpage.
http://pages.cs.wisc.edu/~beyer/

Re: Location of paper

swanhart

2008-02-26 10:51 pm (UTC)

Thanks for the pointer.

Brilliant.

(Anonymous)

2008-04-10 02:16 am (UTC)

Brilliant. Looks like you have done an unreal job. How is the progress going?
Thanks
Tez

No progress. Still waiting on AdBrite to release under LGPL so I can pick it up and develop it further.

You are viewing swanhart