views:

29

answers:

1

Hey there. I'm trying to create a Regex javascript split, but I'm totally stuck. Here's my input:

9:30 pm
The user did action A.

10:30 pm
Welcome, user John Doe.

***This is a comment

11:30 am
This is some more input.

I want the output array after the split() to be (I've removed the \n for readability):

["9:30 pm The user did action A.", "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30 am This is some more input." ];

My current regular expression is:

var split = text.split(/\s*(?=(\b\d+:\d+|\*\*\*))/);

This works, but there is one problem: the timestamps get repeated in extra elements. So I get:

["9:30", "9:30 pm The user did action A.", "10:30",  "10:30 pm Welcome, user John Doe.", "***This is a comment", "11:30", "11:30 am This is some more input." ];

I cant split on the newlines \n because they aren't consistent, and sometimes there may be no newlines at all.

Could you help me out with a Regex for this?

Thanks so much!!

EDIT: in reply to phleet

It could look like this:

9:30 pm
The user did action A.

He also did action B

10:30 pm Welcome, user John Doe.

Basically, there may or may not be a newline after the timestamp, and there may be multiple newlines for the event description.

+3  A: 

I believe the issue is with regards to how Javascript's split treats capturing groups. The solution may just be to use non-capturing group in your pattern. That is, instead of:

/\s*(?=(\b\d+:\d+|\*\*\*))/

Use

/\s*(?=(?:\b\d+:\d+|\*\*\*))/
        ^^

The (?:___) is what is called a non-capturing group.

Looking at the overall pattern, however, the grouping is not actually needed. You should be able to just use:

/\s*(?=\b\d+:\d+|\*\*\*)/

References


Minor point

Instead of \*\*\*, you could use [*]{3}. This may be more readable. The * is not a meta-character inside a character class definition, so it doesn't have to be escaped. The {3} is how you denote "exactly 3 repetition of".

References

polygenelubricants
See on http://ideone.com/DuxMo for pattern `/\s*(?=\b\d+:\d+|[*]{3})/`
polygenelubricants
Brilliant, thanks so much! This completely solves the problem.
Rohan