Both of these tools rely on a similar set of substitution rules. One of my favorite parts of various math classes was learning how the different mathematical expressions could be substituted for one another, allowing you to solve what otherwise appeared to be impossible problems. Using simple substitutions, SQL can be made to solve difficult problems too, such as running queries in parallel or on multiple shards (or both).
I've tried to collect a some of the rules that I use in my tools. Most of these rules aren't very useful for your regular SQL applications, they are meant to be applied when you build a parallel query tool like Shard-Query.
Rule #1 - Algebraic rules apply to SQL expressions
This rule is important for query optimization. The MySQL optimizer isn't very smart about optimizing expressions so that they can use indexes.
Okay, that was a general rule of thumb, but pretty much everything else is related to parallel querying.
Rule #2 - An IN LIST is equivalent to a UNION ALL (as long as the are no duplicate items in the IN list)
SELECT * FROM t WHERE some_col = 2
SELECT * FROM t WHERE some_col = 3
Rule #3 - COUNT(*) = SUM(1)
Rule #3 is only useful in a very limited context, like Flexviews.
Rule #4 - AVG(expr) = SUM(expr)/COUNT(expr)
Of course, you knew that AVG = SUM/COUNT, but it is important to remember that this decomposition can be made to split up the work.
Rule #5 - In some cases, BETWEEN can be expressed as an IN LIST (for DATE and INTEGER type columns) This conversion to IN makes range lookups on a hash partitioned column possible, as long as the column is INTEGER or DATE.
Rule #6 - It is possible to run each part of a UNION ALL in parallel if you use read-committed transaction isolation level and the query uses no aggregation
The example given in rule #2 is perfect for this. Each portion of the UNION ALL is independent and can be run in parallel without any further modification.
Rule #7 - It is possible to rewrite aggregate queries for parallel work for MIN/MAX/AVG/SUM/COUNT, but you need the additional following rewrite rules to "wrap" the UNION ALL:
COUNT(expr) := SUM(`COUNT(expr)`)
AVG(expr) := SUM(`SUM(expr)`) / SUM(`COUNT(expr)`) # see below and rule #4
SELECT SUM(the_union.expr_s) / SUM(the_union.expr_c) as expr
SELECT SUM(some_col) as expr_s, COUNT(some_col) as expr_c FROM t WHERE col2 = 1
SELECT SUM(some_col) as expr_s, COUNT(some_col) as expr_c FROM t WHERE col2 = 2
SELECT SUM(some_col) as expr_s, COUNT(some_col) as expr_c FROM t WHERE col2 = 3
) as the_union
Rule #8 - It is possible to rewrite aggregate queries for parallel work for STDDEV,VARIANCE,ETC, but you need the additional following rewrite rules to defer the aggregation until the final step.
STDDEV(expr) := SELECT ... ,expr, ... GROUP BY expr
SELECT STDDEV(the_union.expr) as expr
SELECT some_col as expr FROM t WHERE col2 = 1 GROUP BY some_col
SELECT some_col as expr FROM t WHERE col2 = 2 GROUP BY some_col
SELECT some_col as expr FROM t WHERE col2 = 3 GROUP BY some_col
) as the_union