views:

63

answers:

3

What I'm looking to do is split data from string into an array.

Here's the general idea of the text format...

xxxxx denotes any mix of alpha-numeric-whitespace data.

xxxxx
 1 xxxxxxxxxx
 2 xxxxxxxxxx
xxxxxxxxx
xxxxxxxxx
xxxxxxxx
 3 xxxxxxxxxx
 4 xxxxxxxxxx
xxxxxxxxxx
 5 xxxxxxxxxx

(When numbers get into the double digits, the ten's place goes into the blank position in-front of the number)

Now what I want to do is have an array of 5 elements (in this case), which stores the number and all data that trails (including the new lines). In the past this was not a big deal and I could use string.split("\n") , but now I need to delimit based on some sort of regex like /\n [0-9]{1,2}/ so I'm looking for a quick and easy way to do this (as split() doesn't support regex).

I want the array to be like

array[1] = " 1 xxxxxxxxxx"
array[2] = " 2 xxxxxxxxxxx\nxxxxxxxxxx\nxxxxxxxxxx"
array[3] = " 3 xxxxxxxxxx"
...etc
+1  A: 

You can use lookahead and split on (?= [1-9] |[1-9][0-9] ), perhaps anchored at the beginning of a line, but there may be issues with ambiguities in the xxxx part. This also doesn't ensure that the numbering is sequential.

Example

var text =
  "preface\n" +
  " 1 intro\n" +
  " 2 body\n" +
  "more body\n" +
  " 3 stuff\n" +
  "more stuff\n" +
  "even 4 stuff\n" +
  "10 conclusion\n" +
  "13 appendix\n";

print(text.split(/^(?= [1-9] |[1-9][0-9] )/m));

The output is (as seen on ideone.com):

preface
, 1 intro
, 2 body
more body
, 3 stuff
more stuff
even 4 stuff
,10 conclusion
,13 appendix
polygenelubricants
My problem is that I can't use regex with string.split()... And yes, there are a vast set of numbers in there, but I can tell you that they are always padded as described above in this pattern: a new line, a blank(OR a number), a number, a blank, then the rest of the data.
@user: why can't you use regex? You are using javascript, yes?
polygenelubricants
Perhaps more readable output, with `replace` in addition to `split`: http://ideone.com/JwhoC ; if you can at least do regex `replace`, then you can do something like this, where you insert a literal string delimiter, and then `split` on that literal string.
polygenelubricants
+1  A: 

As @polygenelubricants said, you could use a regex with replace and make an interim delimiter, then split on that delimiter and remove it.

Here is a working example from the string you gave above and another I made to test the function. It works with both. Since you didn't provide any real data for an example, I can't test that, but hopefully this will at least get you going on the right track.

function SplitCrazyString(str) {
    var regex = /(\n\s?\d+\s[^(\n\s?\d+)]+)/mg;

    var tempStr = str.replace(regex, "~$1");

    var ary = tempStr.split('~');

    for (var i = 0; i < ary.length; i++) {
        ary[i].replace('~', '');
    }

    return ary;
}
var x = "xxxxx\n" +
    " 1 xxxxxxxxxx\n" +
    " 2 xxxxxxxxxx\n" +
    "xxxxxxxxx\n" +
    "xxxxxxxxx\n" +
    "xxxxxxxx\n" +
    " 3 xxxxxxxxxx\n" +
    " 4 xxxxxxxxxx\n" +
    "xxxxxxxxxx\n" +
    " 5 xxxxxxxxxx\n";
var testStr = "6daf sdf84 as96\n" +
    " 1 sfs 4a8dfa sf4asf\n" +
    " 2 s85 d418 df4 89 f8f\n" +
    "65a1 sdfa48 asdf61\n" +
    "w1c 987a w1ec\n" +
    "a6s85 d1a6f 81sf\n" +
    " 3 woi567 34ewn23 5cwe6\n" +
    " 4 s6k 8hf6 9gd\n" +
    "axxm4x1 dsf615g9 8asdf1jt gsdf8as\n" +
    " 5 n389h c8j923hdha 8h3x982qh\n";

var xAry = SplitCrazyString(x);
var testAry = SplitCrazyString(testStr);
Nate Pinchot
+1  A: 

split() does support regexes. Try this:

text.split(/\n(?=[1-9 ][0-9] )/)
Alan Moore
Hmm, seems the documentation I was reading didn't mention this and I figured it was accurate and never tried. Go me.