tags:

views:

201

answers:

4

Alright, this one's interesting. I have a solution, but I don't like it.

The goal is to be able to find a set of lines that start with 3 periods - not an individual line, mind you, but a collection of all the lines in a row that match. For example, here's some matches (each match is separated by a blank line):

...

...hello

...
...hello
...world
...
...wazzup?
...

My solution is as follows:

^\.\.\..*(\n\.\.\..*)*$

It matches all those, so it's what I'm using for now - however, it looks kinda silly to repeat the \.\.\..* pattern. Is there a simpler way?

Please test your regex before submitting it, rather than submit what "should work." For example, I tried the following first:

(^\.\.\..*$)+

which only returned individual lines, even though in my mind it looks like it would do the trick - I guess I just don't understand regex internals. (And no, I didn't need to set any flags to get ^ and $ to match line boundaries, since I'm implementing this in Ruby.)

So I'm not totally sure there's a good answer, but one would be much appreciated - thanks in advance!

+1  A: 

In most regex implementations you can shorten \.\.\. using \.{3} so your solution would turn into \.{3}.*(\n\.{3}.*)*.

Josef
Matchu
But I do appreciate the shortening - that's step one :D
Matchu
I hadn't fully grasped your requirements till now, sorry. I can't think of a shorter solution so I edited my answer accordingly.
Josef
The solution incorrectly matches lines like this: "foo...bar".
FM
Monty's right, there should be a ^ at the very beginning of that regex.
Alan Moore
+1  A: 

What you already have is already simple and understandable. Keep in mind that more "clever" RegExps may very well be slower and undoubtedly less readable.

Assuming lines are terminated by a \n:

((^|\n)\.{3}[^\n]*)+

I am not familiar with Ruby, so depending on how it returns matches you might need to "nonmatch" groups:

((?:(?:^|\n)\.{3}[^\n]*)+)
Borgar
Excellent point: some benchmarking showed that this more compressed regex performed at about two-thirds the speed.I'm not sure, then, what I do now in regards to this question... I guess I'll just mark the other since it's the only shortening I can really feel comfortable doing - though I'm disappointed there's no "simple" answer. Thanks!
Matchu
A: 

You are pretty close to a solution with (^\.\.\..*$)+, but because the + modifier is on the outside of the group, it is getting overwritten each time and you are only left with the last line. Try wrapping it in an outer group: ((^\.\.\..*$)+) and looking at the first submatch and ignoring the inner one.

Combined with the other suggestion: ((^\.{3}.*$)+)

Alex Barrett
I say ignore the inner one because you disliked the increased complexity in Borgar's response. You could non-match the inner group to ignore it complete: ((?:^\.{3}.*$)+)
Alex Barrett
Fortunately, I'm not concerned with grouping in this case. I don't need submatches. I just need matches that I'm going to replace in order :) So grouping wasn't the issue with that solution. It just wasn't making the right matches at all.
Matchu
So, I'm satisfied as it stands. But thanks for the help!
Matchu
+1  A: 
^([.]{3}.*$\n?)+

This doesn't really need $ in there.

Brad Gilbert
What is the $ doing in the middle of the pattern?
FM
'$' matches the end of a line or the end of the whole string. If it's the end of a line, '\n?' consumes the linefeed so matching can continue on the next line. So the '$' in the middle of the regex is pulling its weight, but the one at the end is redundant.
Alan Moore
Perhaps I'm missing something, but I think the greediness of regular expressions will ensure that entire lines are consumed, so both $ characters can be removed from the pattern: /^([.]{3}.*\n?)+/
FM