tags:

views:

58

answers:

5
some text I want to capture. <tag> junk I don't care about</tag> more stuff I want.

Is there a easy way to write a regex that captures the first and third sentences in one capture?

A: 

Not to my knowledge. Usually that's why regex search-and-replace functions allow you to refer to multiple capturing groups in the first place.

Amber
A: 

Unfortunately No, its not possible. The solution is to capture into two seperate captures and then contactenate after the fact.

According to this older thread on this site:

http://stackoverflow.com/questions/277547/regular-expression-to-skip-character-in-capture-group

bdk
+1  A: 

You could also consider stripping out the unwanted data and then capturing.

data = "some text to capture. <tag>junk</tag> other stuff to capture".
data = re.replace('<tag>[^<]*</tag>', data, "")
data_match = re.match('[\w\. ]+', data)
VoDurden
A: 

A group capture is consecutive so you cant. You can do it in one parse with regex like below and join the line in code

^(?<line1>.*?)(?:\<\w*\>.*?\</\w*\>)(?<line3>.*?)$
Fadrian Sudaman
A: 

here's a non regex way, split on </tag>, go through the array items, find <tag>, then split on <tag> and get first element. eg

>>> s="some text I want to capture. <tag> junk I don't care about</tag> more stuff I want. <tag> don't care </tag> i care"
>>> for item in s.split("</tag>"):
...     if "<tag>" in item:
...        print item.split("<tag>")[0]
...     else:
...        print item
...
some text I want to capture.
 more stuff I want.
 i care

Use the split() function of asp.net to do the same.

ghostdog74