tags:

views:

121

answers:

3

I need some help with regular expressions. Please see the example below. I am capturing specific rid values that are contained between between this

","children":[

and ending with this

 
}]}]}

as shown below.

My problem is that the block shown below repeats itself several times and I want all rids between the start of ","children":[ to }]}]} per block only.

I know I can capture individual rid value with: rid":"([\w\d\-\."]+)

But I don't know how to specify to capture all rid":"([\w\d\-\."]+) that exist between between the start of ","children":[ to }]}]}

Example:

     ","children":[{"type":"stub","context":"","rid":"b1c4922237ce.ee6a3644443fe.10711226e93.d0af7aadbd0-4be3-4353ddd.8b47.f2f4aaf2474f","metaclass":"ASAPModel.BarrierCategory"},
{"type":"stub","context":"","rid":"b1c497ce.ee6a64fe.290c6e93.91c15f91-a1c-4c36.9939.4ab7b94a39ad","metaclass":"ASAPModel.BarrierCategory"},
{"type":"stub","context":"","rid":"b1c497ce.ee6a64fe.27c3ee93.22e90c22-7406-463a.8bff.f6ea88f6ffcc","metaclass":"ASAPModel.BarrierCategory"},
{"type":"stub","context":"","rid":"b1c497ce.ee6a64fe.6a182e93.5c0e7d5c-ff65-451d.afc0.cfc7fbcfc02d","metaclass":"ASAPModel.BarrierCategory"},
{"type":"stub","context":"","rid":"b1c497ce.ee6a64fe.6970ae93.8ea3978e-112b-4bbb.8405.d17071d105d2","metaclass":"ASAPModel.BarrierCategory"}]}]},

     ","children":[{"type":"stub","context":"","rid":"b1c4922237ce.ee6a3644443fe.10711226e93.d0af7aadbd0-4be3-4353ddd.8b47.f2f4aaf2474f","metaclass":"ASAPModel.BarrierCategory"},
{"type":"stub","context":"","rid":"b1c497ce.ee6a64fe.290c6e93.91c15f91-a1c-4c36.9939.4ab7b94a39ad","metaclass":"ASAPModel.BarrierCategory"},
{"type":"stub","context":"","rid":"b1c497ce.ee6a64fe.27c3ee93.22e90c22-7406-463a.8bff.f6ea88f6ffcc","metaclass":"ASAPModel.BarrierCategory"},
{"type":"stub","context":"","rid":"b1c497ce.ee6a64fe.6a182e93.5c0e7d5c-ff65-451d.afc0.cfc7fbcfc02d","metaclass":"ASAPModel.BarrierCategory"},
{"type":"stub","context":"","rid":"b1c497ce.ee6a64fe.6970ae93.8ea3978e-112b-4bbb.8405.d17071d105d2","metaclass":"ASAPModel.BarrierCategory"}]}]},

My problem is that I don't understand how to specify the beginning and end values of where to start the non capturing group and how to say identify one or more of these capture groups sort of like []+

A: 

some thing like \",\"children\":(.*)(?=\\]\\}\\]\\})

play around with it

the forum is absorbing some of my backslashes, word of warning to double up for anyone else

in response to edits

Try breaking up the data into its bracketed groups first, then doing one search for each in a for loop. you can get all the groups at once using regex groups.

Nona Urbiz
Backslashes will come through fine if you mark the string as code using \`...\`
Sinan Ünür
+1  A: 

You need to break this up into two steps:

  1. Get the length of data
  2. Get the rids

    # Make sure you get the first one
    my ( $child ) = $record =~ m/"children":\[([^\]]+)\]/g;
    # Get all in span - the g operator tells the regex to get all ( 'global' )
    my @rids     = $child =~ m/"rid":"([^"]+)"/g; # <-- g operator
    

But it looks like JSON to me, and you could parse data like this with JSON::Syck

Axeman
Its a reposne from dojo tree and I'm using Jmeter which uses perl regular expressions. I need the rids to correlate further requests aka dynamic number of http requests with dyanmic rid values...
MaxiePaxie
@unknown: So you want to know how you can do this *entirely* with regex? There is definitely not an easy method for that.
Axeman
@Axeman I was always told that anythings possible in regex....world peace, space travel, and even complex parsing...
MaxiePaxie
@MaxiePaxie you were told wrong. @Axeman I am afraid he is trying to do this in Javascript, not Perl.
Sinan Ünür
+5  A: 

This looks like JSON (though you example data is incomplete to be valid).

If so then perhaps JSON module from CPAN might be best way forward:

use strict;
use warnings;
use JSON qw( from_json );

# my example data
my $data = q( [ 
    {"children":[ {"type":"stub","rid":"aa"}, {"type":"stub2","rid":"bb"} ] }, 
    {"children":[ {"type":"stub","rid":"cc"}, {"type":"stub2","rid":"dd"} ] } ]
);

my $json = from_json( $data );

for my $rec ( @$json ) {
    for my $child ( @{ $rec->{children} } ) {
        say "rid: ", $child->{rid};
    }
}

This prints:

rid: aa
rid: bb
rid: cc
rid: dd

/I3az/

draegtun
Agreed. There are places where a regexp will just be frustrating.
Trueblood