views:

179

answers:

5

i search for a regex to split the following string:

aaa[bbb,ccc[ddd,{eee:1,mmm:999}],nnn[0,3]]
aaa[bbb,ccc[ddd,{eee:1, mmm:[123,555]}],nnn[0,3]]
aaa[bbb, ccc[ddd, ddd],nnn[0,3]]
aaa[bbb,ddd[0,3]]

by '[' or ']' or ',' unless the ',' is in '{}'. As example: split 'aaa[bbb,ccc[ddd,' to aaa, bbb, ccc, ddd is allow but not {eee:1,mmm:999}.

the result:

aaa, bbb, ccc, ddd, {eee:1,mmm:999}, nnn, 0, 3
aaa, bbb, ccc, ddd, {eee:1, mmm:[123,555]}], nnn, 0, 3
aaa, bbb, ccc, ddd, ddd, nnn, 0, 3
aaa, bbb, ddd, 0, 3

i have read meany other questions but i cant modifie the regex's there are post to do this what i want.

the target language for the expression is javascript.

+2  A: 

It is not possible to do this using regular expressions and handle unlimited nested braces; you need a stack-based parser.

SLaks
Why was this downvoted?
SLaks
Because it isn't true when regex refer to PCRE/etc. See my answer.
Qtax
It supports `/[^][,{}]+|\{[^}]*}/` tho, nothing fancy here at all.
Qtax
@Qtax: Does that handle nested braces?
SLaks
You can nest anything you want except {}, which seems to be the case.
Qtax
@Qtax: I meant nested braces; I edited.
SLaks
+1  A: 

A non-regex way would be to just write a loop that checks the string character by character. When it encounters a {, increment a variable. When it encounters a }, de-increment a variable. When it encounters a , and the variable you were incrementing/de-incrementing was at zero, add the position of the , to a list. When you're done, you have the list of positions where you want to split the string.

I'm assuming that there aren't any closing braces } which occur before opening braces {, otherwise you might want to ignore the misplaced closing braces rather than de-incrementing your variable into the negatives.

Tim Goodman
+1 I probably wouldn't go character-by-character (I'd scan for breaks with a regexp along the lines of `[{,[\]}]`), but yeah, I think I'd probably do something like this rather than trying to force a pure regexp solution on a problem that probably is best suited to other logic.
T.J. Crowder
+1  A: 

Perl/PCRE regex, should work in JS too (as long as {} aren't nested):

$_ = 'aaa[bbb,ccc[ddd,{eee:1,mmm:999}],nnn[0,3]]
aaa[bbb,ccc[ddd,{eee:1, mmm:[123,555]}],nnn[0,3]]
aaa[bbb, ccc[ddd, ddd],nnn[0,3]]
aaa[bbb,ddd[0,3]]';

@r = /[^][,{}]+|\{[^}]*}/g;
print join ", ", @r;

Output:

aaa, bbb, ccc, ddd, {eee:1,mmm:999}, nnn, 0, 3,
aaa, bbb, ccc, ddd, {eee:1, mmm:[123,555]}, nnn, 0, 3,
aaa, bbb,  ccc, ddd,  ddd, nnn, 0, 3,
aaa, bbb, ddd, 0, 3

A rough translation into JavaScript:

var input =
    "aaa[bbb,ccc[ddd,{eee:1,mmm:999}],nnn[0,3]]\n" +
    "aaa[bbb,ccc[ddd,{eee:1, mmm:[123,555]}],nnn[0,3]]\n" +
    "aaa[bbb, ccc[ddd, ddd],nnn[0,3]]\n" +
    "aaa[bbb,ddd[0,3]]";

var re = /[^][,{}]+|\{[^}]*}/g;

var result = [];
while (!!(match = re.exec(input)))
{
    result.push(match[0]);
}

// Using <<value>> rather than just a comma, for clarity around
// whether and how "{...}" was processed or not.
write("<<" + result.join(">><<") + ">>");

It's not clear what the line breaks in the input or result data in the question are meant to be. In the above, they're line breaks in the input data and then not treated specially in the result. If they need to be treated specially, the OP can edit appropriately. And so this is the result of the above (again, using << and >> as separators rather than , for clarity around whether {...} gets processed):

<<aaa>><<bbb>><<ccc>><<ddd>><<{eee:1,mmm:999}>><<nnn>><<0>><<3>><<
aaa>><<bbb>><<ccc>><<ddd>><<{eee:1, mmm:[123,555]}>><<nnn>><<0>><<3>><<
aaa>><<bbb>><< ccc>><<ddd>><< ddd>><<nnn>><<0>><<3>><<
aaa>><<bbb>><<ddd>><<0>><<3>>
Qtax
This does not work in Javascript.
SLaks
@SLaks: It *seems* to come pretty close, at least within the bounds of the test data (and problem statement) provided and my (perhaps naive) translation of it (which I'll edit into it in a moment).
T.J. Crowder
For the avoidance of doubt: I'm not saying this is how I would do it (I don't think I'd use a regexp for this at all, except for scanning for breaks), just providing a rough translation since Qtax just gave Perl/PCRE.
T.J. Crowder
@T.J. Crowder: thanks for the translation.
Qtax
A: 

Separate the {stuff} while you split the rest-

function customRx(s){
 s= s.replace(/[\[\],\s]+$/g,'');
 var Rx=/,?(\{[^}]+\}),?/g, Rs=/[\[\],\s]+/, Rc=/^,|,$/g;
 var A= [], i= 0, M, z= 0;
 while((M= Rx.exec(s))!= null){
  i= M.index;
  if(i> z){
   A.push(s.substring(z, i).split(Rs));
  }
  z= Rx.lastIndex;
  A.push(s.substring(i, z).replace(Rc,''));
 }
 if(s.length> z){
  A.push(s.substring(z).split(Rs));
 }
 return A;
}

// test

var s1= 'aaa[bbb,ccc[ddd,{eee:1,mmm:999}],nnn[0,3]]'+
'aaa[bbb,ccc[ddd,{eee:1, mmm:[123,555]}],nnn[0,3]]'+
'aaa[bbb, ccc[ddd, ddd],nnn[0,3]]'+
'aaa[bbb,ddd[0,3]]';

alert(customRx(s1).join(', '));

returned value (newlines added)>

aaa,bbb,ccc,ddd, {eee:1,mmm:999},

nnn,0,3,aaa,bbb,ccc,ddd, {eee:1, mmm:[123,555]},

nnn,0,3,aaa,bbb,ccc,ddd,ddd,nnn,

0,3,aaa,bbb,ddd,0,3

kennebec
A: 

Assuming you're processing the text line by line, and that braces can't be nested, this split regex should work:

/ *[\[\],]+ *(?=[^{}]*(?:\{[^{}]*\}[^{}]*)*$)/

The first part -- *[\[\],]+ * -- matches one or more of [, ] or , and any surrounding spaces. The rest is a lookahead that asserts that, if there are any braces ahead of the matched characters, they come in balanced pairs. If the text is well formed, that ensures that a match won't occur inside a pair of braces.

Alan Moore
`/ *[][,]+ *(?![^{]*})/`, if the string isn't malformed
Qtax
This Works! Thank you
Floyd