ss - oscar

My SQL Dump

MySQL musings by a self professed MySQL Geek


  • 1

Quoted strings

(Anonymous)
the regex for matching single and double quoted strings doesn't handle escaped quotes inside the string, try something like:

'([^']*(\\'|'')?)*'


What do you mean by mysql backtick names? Where is this in the documentation?

It is in the section on identifiers:
http://dev.mysql.com/doc/refman/5.1/en/identifiers.html

Unless you have the SQL_MODE=ANSI_QUOTES, then you must use backticks to quote identifiers that contain special characters like space.

For example:
create table `x yz` (c1 int) engine=innodb;

Instead of this,

|`[A-Za-z0-9_ .'\"()+\\-&^%\$+?%\\/\\\\!`]+`

I wrote it like this...

|`[^`]*?`

And it seemed to work just fine.
Do you see anything wrong with this?

You can have an escaped quote character in the middle of the identifier, that is: `this is`` a valid identifier`.

I think this regex should work better and I'll edit the post in a moment:
|`(?:[^`]|``)*`+


This matches one non-backtick character optionally followed by either zero or more non-backtick characters, or exactly two backtick characters. This pattern will repeat one or more times. The ?: modifier turns off capturing for the repeating group.

#match backtick mysql names

(Anonymous)
From MySQL 5.1 all Unicode characters (with exception of the NULL character/byte) can be used for identifiers. Even non-printable characters and characters outside the unicode 'basic multilingual plane' can. So I doubt that your expression for "#match backtick mysql names" is correct with recent servers! What say?
.
Peter

Re: #match backtick mysql names

Wait, so I could end up with different table and column names if I change the value of 'client_character_set'?

The regex probably doesn't work with multi-byte character sets anyway. I only have to process fairly simple queries like:

SELECT column as alias, AGG_FUNC(x,y)
FROM ..
JOIN .. ON ..
WHERE ..
GROUP BY ...

I need to fix the expression to support nested function calls and subselects, both which currently don't work but I don't need to worry about supporting them for the moment.

It seems to me that if you need to parse complex structures like subqueries that a more robust lexer/parser should be used, not that there seems to be a good choice for that in the PHP world.


I think this expression

|`[A-Za-z0-9_ .\'\"\(\)+\\-\&\^%\$+?\%\\/\\\\!`]+` # match backtick mysql names

can simply be replaced by

|`[^`]*?`

What do you think?

select \`abc\` from dual :)

improvement

(Anonymous)
A best regex might be :
  [A-Za-z_][A-Za-z_.0-9]*\(.*?\)+   # Match FUNCTION(...) (function like MD5)
  |\(.*?\)+             # Match grouped items
  |"(?:[^"]|\"|"")*"+   # Match double quotes
  |'[^'](?:|\'|'')*'+   #   and single quotes
  |`(?:[^`]|``)*`+      #   and backticks
  |-                    # minus operator
  |=                    # equal operator
  |[0-9]+               # number
  |[^ \n,]+            
  |,

  • 1
?

Log in