My SQL Dump

MySQL musings by a self professed MySQL Geek

  • 1

Quoted strings

the regex for matching single and double quoted strings doesn't handle escaped quotes inside the string, try something like:


What do you mean by mysql backtick names? Where is this in the documentation?

It is in the section on identifiers:

Unless you have the SQL_MODE=ANSI_QUOTES, then you must use backticks to quote identifiers that contain special characters like space.

For example:
create table `x yz` (c1 int) engine=innodb;

Instead of this,

|`[A-Za-z0-9_ .'\"()+\\-&^%\$+?%\\/\\\\!`]+`

I wrote it like this...


And it seemed to work just fine.
Do you see anything wrong with this?

You can have an escaped quote character in the middle of the identifier, that is: `this is`` a valid identifier`.

I think this regex should work better and I'll edit the post in a moment:

This matches one non-backtick character optionally followed by either zero or more non-backtick characters, or exactly two backtick characters. This pattern will repeat one or more times. The ?: modifier turns off capturing for the repeating group.

#match backtick mysql names

From MySQL 5.1 all Unicode characters (with exception of the NULL character/byte) can be used for identifiers. Even non-printable characters and characters outside the unicode 'basic multilingual plane' can. So I doubt that your expression for "#match backtick mysql names" is correct with recent servers! What say?

Re: #match backtick mysql names

Wait, so I could end up with different table and column names if I change the value of 'client_character_set'?

The regex probably doesn't work with multi-byte character sets anyway. I only have to process fairly simple queries like:

SELECT column as alias, AGG_FUNC(x,y)
JOIN .. ON ..

I need to fix the expression to support nested function calls and subselects, both which currently don't work but I don't need to worry about supporting them for the moment.

It seems to me that if you need to parse complex structures like subqueries that a more robust lexer/parser should be used, not that there seems to be a good choice for that in the PHP world.

I think this expression

|`[A-Za-z0-9_ .\'\"\(\)+\\-\&\^%\$+?\%\\/\\\\!`]+` # match backtick mysql names

can simply be replaced by


What do you think?

select \`abc\` from dual :)


A best regex might be :
  [A-Za-z_][A-Za-z_.0-9]*\(.*?\)+   # Match FUNCTION(...) (function like MD5)
  |\(.*?\)+             # Match grouped items
  |"(?:[^"]|\"|"")*"+   # Match double quotes
  |'[^'](?:|\'|'')*'+   #   and single quotes
  |`(?:[^`]|``)*`+      #   and backticks
  |-                    # minus operator
  |=                    # equal operator
  |[0-9]+               # number
  |[^ \n,]+            

  • 1

Log in