Both of these tools rely on a similar set of substitution rules. One of my favorite parts of various math classes was learning how the different mathematical expressions could be substituted for one another, allowing you to solve what otherwise appeared to be impossible problems. Using simple substitutions, SQL can be made to solve difficult problems too, such as running queries in parallel or on multiple shards (or both).

I've tried to collect a some of the rules that I use in my tools. Most of these rules aren't very useful for your regular SQL applications, they are meant to be applied when you build a parallel query tool like Shard-Query.

**Rule #1 - Algebraic rules apply to SQL expressions**

This rule is important for query optimization. The MySQL optimizer isn't very smart about optimizing expressions so that they can use indexes.

For example:

SELECT * FROM t WHERE some_col + interval 10 minute >= now()

SELECT * FROM t WHERE some_col >= now() - interval 10 minute;

Okay, that was a general rule of thumb, but pretty much everything else is related to parallel querying.

**Rule #2 - An IN LIST is equivalent to a UNION ALL (as long as the are no duplicate items in the IN list)**

For example:

SELECT * FROM t WHERE some_col IN (1,2,3)

Can be changed to:

SELECT * FROM t WHERE some_col = 1

UNION ALL

SELECT * FROM t WHERE some_col = 2

UNION ALL

SELECT * FROM t WHERE some_col = 3

UNION ALL

SELECT * FROM t WHERE some_col = 2

UNION ALL

SELECT * FROM t WHERE some_col = 3

**Rule #3 - COUNT(*) = SUM(1)**

SELECT COUNT(*) FROM t

Can be changed to:

SELECT SUM(1) as `COUNT(*)` FROM t

Rule #3 is only useful in a very limited context, like Flexviews.

**Rule #4 - AVG(expr) = SUM(expr)/COUNT(expr)**

For example:

SELECT AVG(some_col) FROM t

Can be changed to:

SELECT SUM(some_col)/COUNT(some_col) as `AVG(some_col)` FROM t

Of course, you knew that AVG = SUM/COUNT, but it is important to remember that this decomposition can be made to split up the work.

**Rule #5 - In some cases, BETWEEN can be expressed as an IN LIST (for DATE and INTEGER type columns)**This conversion to IN makes range lookups on a hash partitioned column possible, as long as the column is INTEGER or DATE.

**For example:**

SELECT * FROM t where some_col between 1 and 3

Can be changed to:

SELECT * FROM t where some_col IN (1,2,3)

Note that once it is converted to an IN list, then we might be able to further convert the IN to UNION ALL (see rule 2 and #6)

**Rule #6 - It is possible to run each part of a UNION ALL in parallel if you use read-committed transaction isolation level and the query uses no aggregation**

The example given in rule #2 is perfect for this. Each portion of the UNION ALL is independent and can be run in parallel without any further modification.

**Rule #7 - It is possible to rewrite aggregate queries for parallel work for MIN/MAX/AVG/SUM/COUNT, but you need the additional following rewrite rules to "wrap" the UNION ALL:**

COUNT(expr) := SUM(`COUNT(expr)`)

AVG(expr) := SUM(`SUM(expr)`) / SUM(`COUNT(expr)`) # see below and rule #4

For example:

COUNT(expr) := SUM(`COUNT(expr)`)

AVG(expr) := SUM(`SUM(expr)`) / SUM(`COUNT(expr)`) # see below and rule #4

SELECT AVG(some_col) as expr FROM t WHERE col2 in (1,2,3)

SELECT SUM(the_union.expr_s) / SUM(the_union.expr_c) as expr

FROM (

SELECT SUM(some_col) as expr_s, COUNT(some_col) as expr_c FROM t WHERE col2 = 1

UNION ALL

SELECT SUM(some_col) as expr_s, COUNT(some_col) as expr_c FROM t WHERE col2 = 2

UNION ALL

SELECT SUM(some_col) as expr_s, COUNT(some_col) as expr_c FROM t WHERE col2 = 3

) as the_union

**Rule #8 - It is possible to rewrite aggregate queries for parallel work for STDDEV,VARIANCE,ETC, but you need the additional following rewrite rules to defer the aggregation until the final step.**

STDDEV(expr) := SELECT ... ,expr, ... GROUP BY expr

For example:

STDDEV(expr) := SELECT ... ,expr, ... GROUP BY expr

SELECT STDDEV(some_col) as expr FROM t WHERE col2 in (1,2,3)

Can be changed to:SELECT STDDEV(the_union.expr) as expr

FROM (

SELECT some_col as expr FROM t WHERE col2 = 1 GROUP BY some_col

UNION ALL

SELECT some_col as expr FROM t WHERE col2 = 2 GROUP BY some_col

UNION ALL

SELECT some_col as expr FROM t WHERE col2 = 3 GROUP BY some_col

) as the_union