views:

397

answers:

3

I apologize for the very specific issue I'm posting here but I hope it will help others that may also run across this issue. I have a string that is being formatted to the following:

[[,action1,,],[action2],[]]

I would like to translate this to valid YAML so that it can be parsed which would look like this:

[['','acton1','',''],['action2'],['']]

I've tried a bunch of regular expressions to accomplish this but I'm afraid that I'm at a complete loss. I'm ok with running multiple expressions if needed. For example (ruby):

puts s.gsub!(/,/,"','")  # => [[','action1','',']','[action2]','[]]
puts s.gsub!(/\[',/, "['',") # => [['','action1','',']','[action2]','[]]

That's getting there, but I have a feeling I'm starting to go down a rat-hole with this approach. Is there a better way to accomplish this?

Thanks for the help!

+4  A: 

This does the job for the empty fields (ruby1.9):

s.gsub(/(?<=[\[,])(?=[,\]])/, "''")

Or for ruby1.8, which doesn't support zero-width look-behind:

s.gsub(/([\[,])(?=[,\]])/, "\\1''")

Quoting non-empty fields can be done with one of these:

s.gsub(/(?<=[\[,])\b|\b(?=[,\]])/, "'")
s.gsub(/(\w+)/, "'\\1'")

In the above I'm making use of zero-width positive look behind and zero-width positive look ahead assertions (the '(?<=' and '(?=').

I've looked for some ruby specific documentation but could not find anything that explains these features in particular. Instead, please let me refer you to perlre.

Inshallah
I'm unable to get this to work. I'm not sure if ruby supports the look behind or look ahead syntax you used. I get the following error: undefined (?...) sequence: /(?<=[\[,])(?=[,\]])/
Mike Farmer
Ah, I've used ruby1.9. Earlier versions may not support these features.
Inshallah
No, lookbehinds aren't supported out-of-the-box before 1.9.
Alan Moore
Very nice. Thanks for elaborating. I am using 1.8 so the additional code for both versions is greatly appreciated. You are a wiz!
Mike Farmer
+1  A: 

Try this:

s.gsub(/([\[,])(?=[,\]])/, "\\1''")
 .gsub(/([\[,])(?=[^'\[])|([^\]'])(?=[,\]])/, "\\+'");

EDIT: I'm not sure about the replacement syntax. That's supposed to be group #1 in the first gsub, and the highest-numbered participating group -- $+ -- in the second.

Alan Moore
You have to use "\\1''", or '\1\'\'' to refer to capturing groups.
Inshallah
Thanks, I was just looking that up.
Alan Moore
+2  A: 

It would be easier to just parse it, then output valid YAML.


Since I don't know Ruby, Here is an example in Perl.


Since you only want a subset of YAML, that appears to be similar to JSON, I used the JSON module.

I've been wanting an excuse to use Regexp::Grammars, so I used it to parse the data.

I guarantee it will work, no matter how deep the arrays are.

#! /usr/bin/env perl
use strict;
#use warnings;
use 5.010;
#use YAML;
use JSON;
use Regexp::Grammars;


my $str = '[[,action1,,],[action2],[],[,],[,[],]]';

my $parser = qr{
  <match=Array>

  <token: Text>
    [^,\[\]]*

  <token: Element>
  (?:
    <.Text>
  |
    <MATCH=Array>
  )

  <token: Array>
  \[
     (?:
       (?{ $MATCH = [qw'']; })
     |
       <[MATCH=Element]>   ** (,)
     )
  \]
}x;


if( $str =~ $parser ){
  say to_json $/{match};
}else{
  die $@ if $@;
}

Which outputs.

[["","action1","",""],["action2"],[],["",""],["",[],""]]

If you really wanted YAML, just un comment "use YAML;", and replace to_json() with Dump()

---
-
  - ''
  - action1
  - ''
  - ''
-
  - action2
- []
-
  - ''
  - ''
-
  - ''
  - []
  - ''
Brad Gilbert
+1 That's a very nice demonstration. The regexp way works for deep lists too though.
Inshallah
I always find it harder to figure out if a regex/replace will work, than just parsing it.
Brad Gilbert