tags:

views:

272

answers:

4

I would like to use a regular expression to extract "bind variable" parameters from a string that contains a SQL statement. In Oracle, the parameters are prefixed with a colon.

For example, like this:

SELECT * FROM employee WHERE name = :variable1 OR empno = :variable2

Can I use a regular expression to extract "variable1" and "variable2" from the string? That is, get all words that start with colon and end with space, comma, or the end of the string.

(I don't care if I get the same name multiple times if the same variable has been used several times in the SQL statement; I can sort that out later.)

+2  A: 

This might work:

:\w+

This just means "a colon, followed by one or more word-class characters".

This obviously assumes you have a POSIX-compliant regular expression system, that supports the word-class syntax.

Of course, this only matches a single such reference. To get both, and skip the noise, something like this should work:

(:\w+).+(:\w+)
unwind
+1  A: 

If your regex parser supports word boundaries,

:[a-zA-Z_0-9]\b
Blindy
`:` is already a word boundary, so you can skip the first `\b`.
tangens
fair enough, I'll change it.
Blindy
A: 

Try the following:

sed -e 's/[ ,]/\\n/g' yourFile.sql | grep '^:.*$' | sort | uniq

assuming your SQL is in a file called "yourFile.sql".

This should give a list of variables with no duplicates.

dave
This fails for non-space separators.
Blindy
Updated to handle commas. The Q specifies space, comma or eol. So this should cover it now.
dave
+1  A: 

For being able to handle such an easy case by yourself you should have a look at regex quickstart.

For the meantime use:

:\w+
tangens