views:

93

answers:

2

I want to split certain text using JavaScript. The text looks like:

9:30 pm
The user did action A.

10:30 pm
Welcome, user John Doe.

11:30 am
Messaged user John Doe

Now, I want to split the string into events. i.e.:

9:30 pm
The user did action A.

would be one event. I'm using RegEx for this:

var split = journals.split(/\d*\d:/);

Thing is, the first two characters are getting lost. The split appears like this:

30 pm
    The user did action A.

How do I split so that the split maintains the first two/three characters (ie 9: or 10:) etc?

Thanks!

+2  A: 

Wouldn't it be easier to split on the newline?

var split = journals.split(/\n\n/);

EDIT

Try normalizing the string into a format that you can use:

/*
 Non-normalized string
*/
var str = "9:30 pm\nThe user did action A.10:30 pm\nWelcome, user John Doe.\n\n\n11:30 am\nMessaged user John Doe\n12:30 pm\nThe user did something else.";

/*
 Normalizing into a specific format. TIMESTAMP\nDESCRIPTION\n\n.
 Then removing extraneous leading \n\n
*/

str = str.replace(/\n*([0-9]{1,2}:[0-9]{2} (a|p)m)\n*/g, "\n\n$1\n").replace(/^\n+/, "");

var events = str.split(/\n\n/);

/*
 The following should display an array of strings of the form:
 TIMESTAMP\nDESCRIPTION
*/
console.log(events); 

/*
 Loop through events and split on single newline to get timestamp and description
*/
for(var i = 0; i < events.length; i++) {
   var event = events[i];
   var eventData = event.split(/\n/);
   var time = eventData[0];
   var description = eventData[1];
   console.log(time, description);
}
Vivin Paliath
Yep. Sometimes there's an extra newline, sometimes there are no newlines, sometimes users remove newlines. That's not an option, unfortunately.
Rohan
Thanks for the edit. Thing is, the input string itself doesn't assure that there even will be \n characters in the correct places.
Rohan
@Rohan The regular expression ensures that it inserts the `\n` at the appropriate places (it looks for a pattern resembling a timestamp that could be preceded and/or succeeded by 0 or many `\n`s). But I think you should go with Andy's solution. It's much easier.
Vivin Paliath
+1 for putting the effort in for the normalization regex.
Andy E
+3  A: 

Use a lookahead:

var split = journals.split(/(?=\b\d+:)/);
Andy E
Did you test it? This was the first I tested, but to my surprise it splitted between `\d*` and `\d` as well (ending up in 5 elements).
BalusC
@BalusC: no, I was in a bit of a hurry (I'm in the middle of making dinner). I've split on "non-characters" before using lookaheads though.
Andy E
This actually works. Thanks so much!(possible error could be my initial regex isn't perfect, I added the last \d* later on.
Rohan
+1 much easier!
Vivin Paliath
@BalusC, @Rohan: fixed the regex using a word boundary `\b`
Andy E
Drat, I *knew* it. Good find, Andy :)
BalusC
@Rohan: I should have addressed you first in my previous comment, I don't think it would alert both you and @BalusC. I fixed the regex for you using a word boundary.
Andy E
Thanks Andy, but it worked as is :-)
Rohan
@Andy E's head - if you could help out with this too please: http://stackoverflow.com/questions/3047594/javascript-split-with-regexThanks so much!!!
Rohan