tags:

views:

74

answers:

5

Context


Using Ruby I am parsing strings looking like this:

A type with an ID...

[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]

...with between 0 and n additional options separated with @...

[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]

In this example:

[Image=4b5da003ee133e8368000002@size:small@media:true]

I want to retrieve:

  1. [Image=4b5da003ee133e8368000002@size:small@media:true]
  2. Image
  3. 4b5da003ee133e8368000002
  4. size:small
  5. media:true

Problem


Right now using this regex:

(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(@[a-zA-Z]+:[a-zA-Z]+)*\])

I get...

  1. [Image=4b5da003ee133e8368000002@size:small@media:true]
  2. Image
  3. 4b5da003ee133e8368000002
  4. @media:true

What am I doing wrong? How can I get what I want?

PS: All the results are copied from http://rubular.com/ which is nice to debug regex. Please use it if it can help you help me :)


Edit : if it's impossible to get all options separated, how could I get this:

  1. [Image=4b5da003ee133e8368000002@size:small@media:true]
  2. Image
  3. 4b5da003ee133e8368000002
  4. @size:small@media:true
+3  A: 

Edit:

Ruby's Regex implementation seems not to support multiple captures on one group, as most other regex engines do. Therefore, you'll have to do two steps; first getting all the @*:* in one string and then split those.

To get all of them, this should work:

(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\])
Lucero
Can you elaborate on how to get both values for the 4th group?
Matthew Flaschen
I edited my question since I need to get all options
marcgg
@matthew: If it's not possible to get what I wanted in the first place then @lucero is right
marcgg
@Lucero: if by "multiple captures on one group" you mean something like .NET's GroupCapture construct with its ability to return all intermediate captures, that's actually very rare. AFAIK, only .NET and Perl 6/Parrot provide that capability.
Alan Moore
@Alan, okay, "most" may be wrong, I had the wrong impression because I used those which support it or didn't need the intermediate captures and therefore didn't notice... thanks for clarifying this.
Lucero
+1  A: 
(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(?:@([a-zA-Z]+:[a-zA-Z]+))*\])

will give you media:true. Note that media:true is overwriting the previous size:small match. I don't think there's a way to get exactly what you want in a single match call.

Matthew Flaschen
Thanks for the answer. I do need both options in a single call. I edited my question to reflect that
marcgg
+2  A: 

To get the "tail" of options, you could fetch it from $4 with

/(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/

and then split on at-signs.

For example:

#! /usr/bin/ruby

str = "[Image=4b5da003ee133e8368000002@size:small@media:true]"
if /(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/.match(str)
  print $1, "\n",
        $2, "\n",
        $3, "\n",
        $4, "\n";

  $4[1..-1].split(/@/).each do |s|
    print s, "\n";
  end
end

Output:

[Image=4b5da003ee133e8368000002@size:small@media:true]
Image
4b5da003ee133e8368000002
@size:small@media:true
size:small
media:true
Greg Bacon
Thanks for the answer, but this is not really what I want. This gets me 3:@size:small@media:true and 4:@media:true
marcgg
@marcgg See the program output in my answer.
Greg Bacon
ok makes sense. +1 to you ^^
marcgg
+1  A: 

It looks like the regex only keeps the last match. I think to get the list of matches will require a different approach.

"a=b@c:d@e:f".split(/=|@/)

which creates a list:

["a", "b", "c:d", "e:f"]

which is close to what you want...

Chris Hulan
This is what I was going to suggest, too. So much easier this way.
glenn mcdonald
+1  A: 

Although it can be tricky to do it purely within a regexp, it's not too hard to split it out as a two-step operation:

while (line = DATA.gets)
  line.chomp!

  if (m = line.match(/\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\]/))
    (type, hash, options) = m.to_a[1, 3]
    options = options.split(/@/).reject { |s| s.empty? }
    puts [ type, hash, options.join(',') ].join(' / ')
  end
end

__END__
[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]
[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]
[Image=4b5da003ee133e8368000002@size:small@media:true@foo:bar]

This produces the output:

Image / 4b5da003ee133e8368000002 / 
Video / 679hfpam9v56dh800khfdd32 / 
Image / 4b5da003ee133e8368000002 / size:small
Image / 4b5da003ee133e8368000002 / size:small,media:true
Image / 4b5da003ee133e8368000002 / size:small,media:true,foo:bar
tadman