tags:

views:

1249

answers:

6

Is there a way to specify a regular expression to find every 2nd occurrence of a pattern in a string?

Examples

  • searching for a against string abcdabcd should find one occurence at position 5
  • searching for ab against string abcdabcd should find one occurence at position 5
  • searching for dab against string abcdabcd should find no occurences
  • searching for a against string aaaa should find two occurences at positions 2 and 4
+1  A: 

Would something like "(pattern.*?(pattern))*" work for you?

Edit:

The problem with this is that is uses the non-greedy operator *?, and it can be require an awful lot of backtracking along the string, whereas regexes usually don't have to look at a letter more than once. What this means for you, is that this could be slow for large gaps.

Patrick
needs to be non-greedy
annakata
Forgot about that. Fixed it.
Patrick
I'm not sure, Patrick, I would say that the non-greedy operators can use less backtracking. It depends on the algorithm you use, of course, but to check "a.*a" you have to go up to the end of the string and try matching backward, for "a.*?a" you can try matching forward and stop when you do it.
Remo.D
Remo.D is right: non-greedy quantifiers don't increase backtracking, they eliminate it. (They may or may not be less efficient, but if they are it won't be because of backtracking.) But in this case, efficiency is irrelevant; as annakata pointed out, the quantifier has to be non-greedy for this approach to work.
Alan Moore
+3  A: 

Suppose the pattern you want is abc+d. You want to match the second occurrence of this pattern in a string.

You would construct the following regex:

abc+d.*?(abc+d)

This would match strings of the form: <your-pattern>...<your-pattern>. Since we're using the reluctant qualifier *? we're safe that there cannot be another match of between the two. Using matcher groups which pretty much all regex implementations provide you would then retrieve the string in the bracketed group which is what you want.

Il-Bhima
+7  A: 

Use grouping.

foo.*?(foo)
Alex Barrett
+1  A: 

There's no "direct" way of doing so but you can specify the pattern twice as in: a[^a]*a that match up to the second "a".

The alternative is to use your programming language (perl? C#? ...) to match the first occurence and then the second one.

EDIT: I've seen other responded using the "non-greedy" operators which might be a good way to go, assuming you have them in your regex library!

Remo.D
/a[^a]*a/ finds the next two occurrences of 'a', but doesn't tell you where the second one is. Also, it only works when the base pattern is exactly one character long.
Alan Moore
A: 

If you're using C#, you can either get all the matches at once, ie. use Regex.Matches() which returns a MatchCollection (check the index of the item, index % 2 != 0).

If you want to find the occurance to replace it, use one of the overloads of Regex.Replace() that use a MatchEvaluator), e.g. Regex.Replace (String, String, MatchEvaluator, here's the code:

using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "abcdabcd";

            // Replace *second* a with m

            string replacedString = Regex.Replace(
                input,
                "a",
                new SecondOccuranceFinder("m").MatchEvaluator);

            Console.WriteLine(replacedString);
            Console.Read();

        }

        class SecondOccuranceFinder
        {
            public SecondOccuranceFinder(string replaceWith)
            {
                _replaceWith = replaceWith;
                _matchEvaluator = new MatchEvaluator(IsSecondOccurance);
            }

            private string _replaceWith;

            private MatchEvaluator _matchEvaluator;
            public MatchEvaluator MatchEvaluator
            {
                get
                {
                    return _matchEvaluator;
                }
            }

            private int _matchIndex;
            public string IsSecondOccurance(Match m)
            {
                _matchIndex++;
                if (_matchIndex % 2 == 0)
                    return _replaceWith;
                else
                    return m.Value;
            }
        }
    }
}
Waleed Eissa
A: 

Back references can find interesting solutions here. This regex:

([a-z]+).*(\1)

will find the longest repeated sequence.

This one will find a sequence of 3 letters that is repeated:

([a-z]{3}).*(\1)
Jeff Moser
This is slightly different take on the problem than the other answers, but you still need to make the quantifier non-greedy: /([a-z]+).*?(\1)/
Alan Moore