#PHP Heredoc syntax $regex = <<< END_OF_REGEX / [^ \"'`(),]*\([^)]*\) #match functions like concat(x,"y",`a`.`z`) or sum(`xyz`); |\([^)]*?\) #match grouped items |"[^"]*?" #match double quoted items |'[^']*?' #match single quoted items |`[A-Za-z0-9_ .'\"()+\\-&^%\$+?%\\/\\\\!`]+` #match backtick mysql names |[^ ,]+ #match keywords, operators and aliases |, /xi END_OF_REGEX;
EDIT: After some comments and further testing, I've modified the regex to look like the following:
$regex = <<<'END_OF_REGEX' / [A-Za-z_.]+\(.*?\)+ # Match FUNCTION(...) |\(.*?\)+ # Match grouped items |"(?:[^"]|\"|"")*"+ # Match double quotes |'[^'](?:|\'|'')*'+ # and single quotes |`(?:[^`]|``)*`+ # and backticks |[^ ,]+ |, /x END_OF_REGEX;
This will properly match escaped values within quoted statements. Functions with nested function calls or subselects may not end up being fully grouped together. If anybody has any idea how I might fix that, well, feel free to chime in :D
EDIT END
When used with the preg_match_all() function, an array is produced which represents the tokenized string. Functions calls, quoted strings and grouped expressions are returned together, and may need additional processing, depending on your needs.
For example:
-- select sum(a * b) sweety,'abc' as a1, -- "abc",concat(`def`.`abc`,'hello', x.y, z) as `blah`, -- null + 1 -- from `abc`.`def` as def1 -- join xyz.zzz z1 -- on (z1.a = def1.a) Array ( [0] => select [1] => sum(a * b) [2] => sweety [3] => , [4] => 'abc' [5] => as [6] => a1 [7] => , [8] => "abc" [9] => , [10] => concat(`def`.`abc`,'hello', x.y, z) [11] => as [12] => `blah` [13] => , [14] => null [15] => + [16] => 1 [17] => from [18] => `abc`.`def` [19] => as [20] => def1 [21] => join [22] => xyz.zzz [23] => z1 [24] => on [25] => (z1.a = def1.a) )
I'm using this to process SQL statements and automatically create the Flexviews calls for the query.
The following output is generated by my program for the above SQL statement (only the SELECT clause is currently represented):
call flexviews.add_expr(@mvid, 'SUM', 'a * b', 'sweety'); call flexviews.add_expr(@mvid, 'GROUP', 'abc', '1'); call flexviews.add_expr(@mvid, 'GROUP', "abc", '_abc_'); call flexviews.add_expr(@mvid, 'GROUP', 'concat(`def`.`abc`,\'hello\', x.y, z)', 'blah'); call flexviews.add_expr(@mvid, 'GROUP', 'null + 1', 'null_+_1');
Quoted strings
'([^']*(\\'|'')?)*'
http://dev.mysql.com/doc/refman/5.1/en/identifiers.html
Unless you have the SQL_MODE=ANSI_QUOTES, then you must use backticks to quote identifiers that contain special characters like space.
For example:
create table `x yz` (c1 int) engine=innodb;
|`[A-Za-z0-9_ .'\"()+\\-&^%\$+?%\\/\\\\!`]+`
I wrote it like this...
|`[^`]*?`
And it seemed to work just fine.
Do you see anything wrong with this?
I think this regex should work better and I'll edit the post in a moment:
|`(?:[^`]|``)*`+
This matches one non-backtick character optionally followed by either zero or more non-backtick characters, or exactly two backtick characters. This pattern will repeat one or more times. The ?: modifier turns off capturing for the repeating group.
#match backtick mysql names
.
Peter
Re: #match backtick mysql names
The regex probably doesn't work with multi-byte character sets anyway. I only have to process fairly simple queries like:
SELECT column as alias, AGG_FUNC(x,y)
FROM ..
JOIN .. ON ..
WHERE ..
GROUP BY ...
I need to fix the expression to support nested function calls and subselects, both which currently don't work but I don't need to worry about supporting them for the moment.
It seems to me that if you need to parse complex structures like subqueries that a more robust lexer/parser should be used, not that there seems to be a good choice for that in the PHP world.
|`[A-Za-z0-9_ .\'\"\(\)+\\-\&\^%\$+?\%\\/\\\\!`]+` # match backtick mysql names
can simply be replaced by
|`[^`]*?`
What do you think?
improvement