ansaurus

Question

Optimal Regular Expression: match sets of lines starting with ...

Answer 1

+1 A:

In most regex implementations you can shorten \.\.\. using \.{3} so your solution would turn into \.{3}.*(\n\.{3}.*)*.

Josef 2009-07-03 21:28:04

Matchu 2009-07-03 21:34:07

But I do appreciate the shortening - that's step one :D

Matchu 2009-07-03 21:36:56

I hadn't fully grasped your requirements till now, sorry. I can't think of a shorter solution so I edited my answer accordingly.

Josef 2009-07-03 21:59:51

The solution incorrectly matches lines like this: "foo...bar".

FM 2009-07-04 16:18:17

Monty's right, there should be a ^ at the very beginning of that regex.

Alan Moore 2009-07-04 17:49:33

Answer 2

+1 A:

What you already have is already simple and understandable. Keep in mind that more "clever" RegExps may very well be slower and undoubtedly less readable.

Assuming lines are terminated by a \n:

((^|\n)\.{3}[^\n]*)+

I am not familiar with Ruby, so depending on how it returns matches you might need to "nonmatch" groups:

((?:(?:^|\n)\.{3}[^\n]*)+)

Borgar 2009-07-03 22:21:32

Excellent point: some benchmarking showed that this more compressed regex performed at about two-thirds the speed.I'm not sure, then, what I do now in regards to this question... I guess I'll just mark the other since it's the only shortening I can really feel comfortable doing - though I'm disappointed there's no "simple" answer. Thanks!

Matchu 2009-07-03 23:16:49

Answer 3

A:

You are pretty close to a solution with (^\.\.\..*$)+, but because the + modifier is on the outside of the group, it is getting overwritten each time and you are only left with the last line. Try wrapping it in an outer group: ((^\.\.\..*$)+) and looking at the first submatch and ignoring the inner one.

Combined with the other suggestion: ((^\.{3}.*$)+)

Alex Barrett 2009-07-04 01:04:15

I say ignore the inner one because you disliked the increased complexity in Borgar's response. You could non-match the inner group to ignore it complete: ((?:^\.{3}.*$)+)

Alex Barrett 2009-07-04 01:08:06

Fortunately, I'm not concerned with grouping in this case. I don't need submatches. I just need matches that I'm going to replace in order :) So grouping wasn't the issue with that solution. It just wasn't making the right matches at all.

Matchu 2009-07-04 01:33:20

So, I'm satisfied as it stands. But thanks for the help!

Matchu 2009-07-04 01:34:21

Answer 4

+1 A:

^([.]{3}.*$\n?)+

This doesn't really need $ in there.

Brad Gilbert 2009-07-04 14:08:36

What is the $ doing in the middle of the pattern?

FM 2009-07-04 16:26:34

'$' matches the end of a line or the end of the whole string. If it's the end of a line, '\n?' consumes the linefeed so matching can continue on the next line. So the '$' in the middle of the regex is pulling its weight, but the one at the end is redundant.

Alan Moore 2009-07-04 17:31:01

Perhaps I'm missing something, but I think the greediness of regular expressions will ensure that entire lines are consumed, so both $ characters can be removed from the pattern: /^([.]{3}.*\n?)+/

FM 2009-07-04 18:53:43

ansaurus

tags:

views:

answers:

Optimal Regular Expression: match sets of lines starting with ...

related questions