ansaurus

Question

How can I extract field names from SQL with Perl?

Answer 1

A:

How about splitting each line into terms (replace every parenthesis, comma and space with a newline), then sorting:

perl -ne's/[(), ]/\n/g; print' < textfile | sort -u

You'll end up with a lot of content like:

fieldname1
fieldname1
formatstring
ltrim
rtrim
t_char

Ether 2010-02-01 00:47:36

Answer 2

+11 A:

Don't try to write a regex parser (though perl regexes can handle nested patterns like that), use SQL::Statement::Structure.

ysth 2010-02-01 02:05:26

+1 for SQL::Statement and friends. While it can't handle every extreme of contorted SQL, it is at least the right kind of tool for this job.

pilcrow 2010-02-01 02:18:53

Answer 3

+2 A:

Why not ask the target database itself how it would interpret the queries?

In perl, one can use the DBI to query the prepared representation of a SQL query. Sometimes this is database-specific: some drivers (under the perl DBD:: namespace) support their RDBMS' idea of describing statements in ways analogous to the RDBMS' native C or C++ API.

It can be done generically, however, as the DBI will put the names of result columns in the statement handle attribute NAME. The following, for example, has a good chance of working on any DBI-supported RDBMS:

use strict;
use warnings;
use DBI;

use constant DSN => 'dbi:YouHaveNotToldUs:dbname=we_do_not_know';

my $dbh = DBI->connect(DSN, ..., { RaiseError => 1 });

my $sth;
while (<>) {
  next unless /^SELECT/i;   # SELECTs only, assume whole query on one line
  chomp;
  my $sql = /\bWHERE\b/i ? "$_ AND 1=0" : "$_ WHERE 1=0"; # XXX ugly!
  eval {
    $sth = $dbh->prepare($sql);  # some drivers don't know column names
    $sth->execute();             # until after a successful execute()
  };
  print $@, next if $@;  # oops, problem with that one
  print join(', ', @{$sth->{NAME}}), "\n";
}

The XXX ugly! bit there tries to append an always-false condition on the SELECT, so that the SQL engine doesn't have to do any real work when you execute(). It's a terribly naive approach -- that /\bWHERE\b/i test is no more correctly identifying a SQL WHERE clause than simple regexes correctly parse out SELECT field names -- but it is likely to work.

pilcrow 2010-02-01 02:54:56

Why not? Because the world isn't ideal and neither is life. I only have access to the data output (sans headers) and the SQL query used to generate the output.

TheObserver 2010-02-01 03:43:43

Answer 4

+1 A:

In a somewhat related problem at the office I used:

my @SqlKeyWordList = qw/select from where .../; # (1)

my @Candidates =split(/\s/,$SqlSelectQuery);      # (2)

my %FieldHash;                                  # (3)
for my $Word (@Candidates)  { 
   next if grep($word,@SqlKeyWordList);
   $FieldHash($Word)++;
}

Comments:

SqlKeyWordList contains all the SQL keywords that are potentially in the SQL statement (we use MySQL, there are many SQL dialiects, choosing/building this list is work, look at my comments below!). If someone decided to use a keyword as a field name, you will need a regex after all (beter to refactor the code).
Split the SQL statement into a list of words, this is the trickiest part and WILL REQUIRE tweeking. For now it uses Perl notion of "space" (=not in word) to split.
Splitting the field list (select a,b,c) and the "from" portion of the SQL might be advisabel here, depends on your SQL statements.
%MyFieldHash will contain one entry per select field (and gunk, until you validated your SqlKeyWorkList and the regex in (2)

Beware

there is nothing in this code that could not be done in Python.
your life would be much easier if you can influence the creation of said SQL statements. (e.g. make sure each field is written to a comment)
there are so many things that can/will go wrong in this parsing approach, you really should sidestep the issue entirely, by changing the process (saves time in the long run).
this is the regex we use at the office

   my @Candidates=split(/[\s
                  \(
                  \)
                  \+
                  \,
                  \*
                 \/
                  \-
                  \n
                  \
                  \=
                  \r
                 ]+/,$SqlSelectQuery
               );

lexu 2010-02-01 05:39:34

clever, if fragile. `select 42 'let x=x'` breaks it two ways.

ysth 2010-02-01 09:33:52

@ysth: (s)he who writes that, deserves what (s)he gets (and then some)

lexu 2010-02-02 11:49:35

ansaurus

tags:

views:

answers:

How can I extract field names from SQL with Perl?

Beware

related questions