tags:

views:

479

answers:

3

I'm probably doing this all wrong. I have a text file full of data and I want to match and replace patterns of "item" and "catalog number" that are in the file. But the order of each element in the file is very important, so I want to match/replace starting from the top of the file and then work my way down.

The code snippet below actually works, but when I execute it, it replaces the third instance of the "SeaMonkey" & "SMKY-1978" pattern and then it replaces the second instance of that pattern. What I'd like it to do is replace the first instance of the pattern and then the second.

So I'd like the output to say "Found Kurt's SMKY-1978 SeaMonkeys" and then "Found Shane's SMKY-1978 SeaMonkeys" and then leave Mick's SMKY-1978 SeaMonkeys alone since I only want to find and replace the first 2 instances of the pattern. Right now it says "Found Shane's SMKY-1978 SeaMonkeys" and "Found Mick's SMKY-1978 SeaMonkeys" because it is matching the last pattern each time the for loop is executed.

So am I missing a subtle little known regex character or am I just doing what I want to do completely and utterly wrong?

Here is the working code:

# my regexp matches from the bottom to the top but I'd like it to replace from the top down
local $/=undef;
my $DataToParse = <DATA>;
my $item = "SeaMonkeys";
my $catNum = "SMKY-1978";
my $maxInstancesToReplace = 2;
parseData();
exit();

sub parseData {
    for (my $counter = 0; $counter < $maxInstancesToReplace; $counter++) {
     # Stick in a temporary text placeholder that I will replace later after more processing
     $DataToParse =~ s/(.+)\sELEMENT\s(.+?)\s\(Item := \"$item\".+?CatalogNumber := \"$catNum.+?END_ELEMENT(.+)/$1 ***** Found $2\'s $catNum $item. (counter: $counter) *****$3/s;
    } 
    print("Here's the result:\n$DataToParse\n");
}

__DATA__
    ELEMENT Kurt (Item := "BrightLite",
                  ItemID := 29,
                  CatalogNumber := "BTLT-9274",
                  Vendor := 100,
    END_ELEMENT

    ELEMENT Mick (Item := "PetRock",
                  ItemID := 36,
                  CatalogNumber := "PTRK-3475/A",
                  Vendor := 82,
    END_ELEMENT

    ELEMENT Kurt (Item := "SeaMonkeys",
                  ItemID := 12,
                  CatalogNumber := "SMKY-1978/E",
                  Vendor := 77,
    END_ELEMENT

    ELEMENT Joe (Item := "Pong",
                 ItemID := 24,
                 CatalogNumber := "PONG-1482",
                 Vendor := 5,
    END_ELEMENT

    ELEMENT Shane (Item := "SeaMonkeys",
                   ItemID := 1032,
                   CatalogNumber := "SMKY-1978/E",
                   Vendor := 77,
    END_ELEMENT

    ELEMENT Kurt (Item := "Battleship",
                  ItemID := 99,
                  CatalogNumber := "BTLS-5234",
                  Vendor := 529,
    END_ELEMENT

    ELEMENT Mick (Item := "SeaMonkeys",
                  ItemID := 8,
                  CatalogNumber := "SMKY-1978/F",
                  Vendor := 77,
    END_ELEMENT

    ELEMENT Frank (Item := "PetRock",
                   ItemID := 42,
                   CatalogNumber := "PTRK-3475/B",
                   Vendor := 82,
    END_ELEMENT

    ELEMENT Joe (Item := "SeaMonkeys",
                 ItemID := 8,
                 CatalogNumber := "SMKY-1979/A",
                 Vendor := 77,
    END_ELEMENT

And here is what it currently outputs:

Here's the result:
        ELEMENT Kurt (Item := "BrightLite",
                      ItemID := 29,
                      CatalogNumber := "BTLT-9274",
                      Vendor := 100,
        END_ELEMENT

        ELEMENT Mick (Item := "PetRock",
                      ItemID := 36,
                      CatalogNumber := "PTRK-3475/A",
                      Vendor := 82,
        END_ELEMENT

        ELEMENT Kurt (Item := "SeaMonkeys",
                      ItemID := 12,
                      CatalogNumber := "SMKY-1978/E",
                      Vendor := 77,
        END_ELEMENT

        ELEMENT Joe (Item := "Pong",
                     ItemID := 24,
                     CatalogNumber := "PONG-1482",
                     Vendor := 5,
        END_ELEMENT

 ***** Found Shane's SMKY-1978 SeaMonkeys. (counter: 1) *****

        ELEMENT Kurt (Item := "Battleship",
                      ItemID := 99,
                      CatalogNumber := "BTLS-5234",
                      Vendor := 529,
        END_ELEMENT

 ***** Found Mick's SMKY-1978 SeaMonkeys. (counter: 0) *****

        ELEMENT Frank (Item := "PetRock",
                       ItemID := 42,
                       CatalogNumber := "PTRK-3475/B",
                       Vendor := 82,
        END_ELEMENT

        ELEMENT Joe (Item := "SeaMonkeys",
                     ItemID := 8,
                     CatalogNumber := "SMKY-1979/A",
                     Vendor := 77,
        END_ELEMENT
+9  A: 

.+ at start of your regex is "greedy". This means that it will match maximum chars.

You regex is better written (it would be more readable and faster) as

my $re=qr/\sELEMENT\s(.+?)\s\(Item := "$item".+?CatalogNumber := "$catNum.+?END_ELEMENT/;

I think that you can simply repeat this match:

sub parseData {
    my $re=qr/\sELEMENT\s(.+?)\s\(Item := "$item".+?CatalogNumber := "$catNum.+?END_ELEMENT(.+)/;
    foreach my $counter (0..$maxInstancesToReplace) {
      # Stick in a temporary text placeholder that I will replace later after more processing
      $DataToParse =~ s/$re/ ***** Found $1\'s $catNum $item. (counter: $counter) *****$2/s;
    } 
    print("Here's the result:\n$DataToParse\n");
}

If repeating is not possible, you should use /e regex modifier.

Alexandr Ciornii
I tried the code in this answer and it did not appear to match anything in the dataset. I tried escaping your double quotes and a couple other things and no luck. Please test your answers before posting.
Kurt W. Leucht
A: 

See the answers you got in your identical Perlmonks.org post. Asking questions in more than one place is ineffective and rude.

Corion
Asking a question in more than one place means there is a wider audience, why would that be inneffective?
postfuturist
I'm sorry I offended you by posting my question on two different sites, Corion. I disagree, though, that it is ineffective and I also disagree that it is rude. But that's just my opinion. Voting on this particular comment will tell what the SO community thinks.
Kurt W. Leucht
So does that mean people cant post questions that have already been asked ANYWHERE on the internet?! Ridiculous... Down vote.
Vyrotek
I think it's rude to those people who answer the question on the other site.
Corion
I agree with this answer. +1
ephemient
A: 

The best solution appears to be to grab each ELEMENT ... END_ELEMENT section from the data and regex only one section at a time rather than feeding the whole complete data set to a regular expression at once. Not exactly what I was trying to accomplish, but I rewrote my program to do this piecemeal processing and it works like a charm.

Kurt W. Leucht