tags:

views:

1659

answers:

6

My program read other programs source code and colect information about used SQL queries. I have problem with getting substring.

...
$line = <FILE_IN>;
until( ($line =~m/$values_string/i && $line !~m/$rem_string/i) || eof )
{
   if($line =~m/ \S{2}DT\S{3}/i)
   {

   # here I wish to get (only) substring that match to pattern \S{2}DT\S{3} 
   # (7 letter table name) and display it.
      $line =~/\S{2}DT\S{3}/i;
      print $line."\n";
...

In result print prints whole line and not a substring I expect. I tried different approach, but I use Perl seldom and probably make basic concept error. ( position of tablename in line is not fixed. Another problem is multiple occurrence i.e.[... SELECT * FROM AADTTAB, BBDTTAB, ...] ). How can I obtain that substring?

+4  A: 

It would be better to match the pattern if it follows FROM. I assume table names consist solely of ASCII letters. In that case, it is best to say what you want. With those two remarks out of the way, note that a successful capturing regex match in list context returns the matched substring(s).

#!/usr/bin/perl

use strict;
use warnings;

my $s = 'select * from aadttab, bbdttab';
if ( my ($table) = $s =~ /FROM ([A-Z]{2}DT[A-Z]{3})/i ) {
    print $table, "\n";
}
__END__

Output:

C:\Temp> s
aadttab

Depending on the version of perl on your system, you may be able to use a named capturing group which might make the whole thing easier to read:

if ( $s =~ /FROM (?<table>[A-Z]{2}DT[A-Z]{3})/i ) {
    print $+{table}, "\n";
}

See perldoc perlre.

Sinan Ünür
+12  A: 

Use grouping with parenthesis and store the first group.

if( $line =~ /(\S{2}DT\S{3})/i )
{
  my $substring = $1;
}

The code above fixes the immediate problem of pulling out the first table name. However, the question also asked how to pull out all the table names. So:

# FROM\s+     match FROM followed by one or more spaces
# (.+?)       match (non-greedy) and capture any character until...
# (?:x|y)     match x OR y - next 2 matches
# [^,]\s+[^,] match non-comma, 1 or more spaces, and non-comma
# \s*;        match 0 or more spaces followed by a semi colon
if( $line =~ /FROM\s+(.+?)(?:[^,]\s+[^,]|\s*;)/i )
{
  # $1 will be table1, table2, table3
  my @tables = split(/\s*,\s*/, $1);
  # delim is a space/comma
  foreach(@tables)
  {
     # $_ = table name
     print $_ . "\n";
  }
}

Result:

If $line = "SELECT * FROM AADTTAB, BBDTTAB;"

Output:

AADTTAB
BBDTTAB

If $line = "SELECT * FROM AADTTAB;"

Output:

AADTTAB

Perl Version: v5.10.0 built for MSWin32-x86-multi-thread

Jesse
+3  A: 

Use a capturing group:

$line =~ /(\S{2}DT\S{3})/i;
my $substr = $1;
friedo
Always check if the match succeeded before using match variables.
Sinan Ünür
+6  A: 

Parens will let you grab part of the regex into special variables: $1, $2, $3... So:

$line = ' abc andtabl 1234';
if($line =~m/ (\S{2}DT\S{3})/i)   {   
    # here I wish to get (only) substring that match to pattern \S{2}DT\S{3}    
    # (7 letter table name) and display it.      
    print $1."\n";
}
10rd_n3r0
A: 

$& contains the string matched by the last pattern match.

Example:

$str = "abcdefghijkl";
$str =~ m/cdefg/;
print $&;
# Output: "cdefg"

So you could do something like

if($line =~m/ \S{2}DT\S{3}/i) {
    print $&."\n";
}

WARNING:

If you use $& in your code it will slow down all pattern matches.

abhinavg
daotoad
Brad Gilbert
kato sheen
Brad Gilbert
I think there may have been some changes that reduce the effect in perl 5.10
Brad Gilbert
+5  A: 

I prefer this:

my ( $table_name ) = $line =~ m/(\S{2}DT\S{3})/i;

This

  1. scans $line and captures the the text corresponding to the pattern
  2. returns "all" the captures (1) to the "list" on the other side.

This psuedo-list context is how we catch the first item in a list. It's done the same way as parameters passed to a subroutine.

my ( $first, $second, @rest ) = @_;


my ( $first_capture, $second_capture, @others ) = $feldman =~ /$some_pattern/;

NOTE:: That said, your regex assumes to much about the text to be useful in more than a handful of situations. Not capturing any table name that doesn't have dt as in positions 3 and 4 out of 7? It's good enough for 1) quick-and-dirty, 2) if you're okay with limited applicability.

Axeman
It's really list context, there's nothing pseudo about it! The tricky thing is using a list of one item. Capturing the results of an operation in a single item list can be very handy when you want to force list-context behavior from the operator or subroutine you are calling. `my $foo = @bar;` is very different from `my ($foo) = @bar;`, and the distinction can come in very handy.
daotoad
Oh, it does come in handy. I use it all the time. I guess "pseudo" is a bad way to put it. I know that a list of one is still a list, it just looks an awful lot like a scalar--and that's all I'm trying to get anyway.
Axeman