views:

357

answers:

6

Does anyone know of any code or tools that can strip literal values out of SQL statements?

The reason for asking is I want to correctly judge the SQL workload in our database and I'm worried I might miss out on bad statements whose resource usage get masked because they are displayed as separate statements. When, in reality, they are pretty much the same thing except for different IDs being passed in.

I'd prefer a database independent solution if any exists. I had thought there might be a nifty Perl module for this but I haven't found any.

Thanks for your help.

+2  A: 

If you use JDBC or something like thar your SQL shouldn't have any literals, just '?' marking where they should be.

John Nilsson
Thanks for your reply. Sadly we do sometimes get sql statementssubmitted with literal values rather than bind variables, and I have no way (yet) of getting development to fix the queries.
Mark S
Only partially true; it is a good goal, but it is seldom reached in practice.
Jonathan Leffler
A: 

So, what you're saying is you may have some badly performing queries that are similar because they are built dynamically, and you want them to be grouped together since they are coming from the same call so that the total time spent executing will rise to the top in your profiling?

I think you'll have to handle this with a regex or something. Maybe you can pass the statements through a SQL prettifier so they are consistent first, then sanitize them - literal strings should be the easiest because they are all in '' (/'.*'/STRING_LITERAL/), and then the numeric literals might be doable by removing /\d*/NUMERIC_LITERAL/ or similar.

Cade Roux
Yes, that's exactly what I'm looking for. You've put it much better than me I think!
Mark S
I agree a regex is needed which is why I was hopeful there would be a perl solution, but I can't see one so far.
Mark S
/-?\b\d+\b/ would be better for numeric (well, integer, anyway) literals
ysth
A: 

I think I would use sed to solve this. Something like this

$ cat sql.txt
SELECT * FROM USER WHERE USERID = 123 OR USERNAME LIKE 'Name1%'
SELECT * FROM USER WHERE USERID = 124 OR USERNAME LIKE 'Name2%'
SELECT * FROM USER WHERE USERID = 125 OR USERNAME LIKE 'Name3%'
SELECT * FROM USER WHERE USERID = 126 OR USERNAME LIKE 'Name4%'

$ sed -e "s/\([0-9]\+\)\|\('[^']*'\)/?/g" sql.txt
SELECT * FROM USER WHERE USERID = ? OR USERNAME LIKE ?
SELECT * FROM USER WHERE USERID = ? OR USERNAME LIKE ?
SELECT * FROM USER WHERE USERID = ? OR USERNAME LIKE ?
SELECT * FROM USER WHERE USERID = ? OR USERNAME LIKE ?
John Nilsson
p.s. If any one could comment on how to avoid all those escape back-slashed it would be greatly apreciated.
John Nilsson
Thanks John, it's a good starting point!
Mark S
+9  A: 

SQL::Statement, in particular the SQL::Statement::Structure module, will let you parse and manipulate SQL statements. The subset of SQL syntax it understands can be seen here.

In a related note, there's DBI::Profile to help with your performance analysis.

Schwern
Thanks Schwern. This looks to be exactly what I need.
Mark S
You're welcome (I didn't know it existed until you asked).
Schwern
A: 

I know you wanted a database-independent option, but if you're using Microsoft SQL Server, the free MS ReadTrace (for SQL 2005) and Read80Trace (for SQL 2000) do exactly this. They take a trace output file and strip out the literals. I haven't seen a db-independent solution either (and I work for Quest Software, an author of db-independent tools).

http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx

Brent Ozar
A: 

The Semantic Designs source code search engine will do this trivially, using its SQL scanner.

The Search Engine lexes source code to produce lexemes, including exactly those literal strings you after. It indexes all the lexemes and uses that index to enable you to search/browse your source code base (even if its huge) in an interactive manner.

The Search Engine query: S (yes, just an S) will locate all literal "S"trings in your SQL text. All the query results are shown in a hit window, and you can click your way directly to view the code in question from any hit. There's a logging facility, that will record all the hits for later analysis.

http://www.semanticdesigns.com/Products/SearchEngine/index.html

Ira Baxter