ansaurus

Question

Finding all occurrences of a sequence of chars when preceded by a specific string

Answer 1

+3 A:

You need to use a lazy quantifier instead of .*. Try this:

/Track type: subtitles.*?Language: (\w\w\w)/m

This should get you the first occurrence of "Language: ???" after each "Track type: subtitles:". But it would get confused if some track (of type subtitles) would be missing the Language field.

Another way to do this would be:

/^\| \+ (?:(?!^\| \+).)*?\+  Track type: subtitles$(?:(?!^\| \+).)*?^\|  \+ Language: (\w+)$/m

Looks somewhat messy, but should take care of the problem with the previous one.

A cleaner way would be to tokenize the string:

/^\| \+ ([^\r\n]+)|^\|  \+ Track type: (subtitles)|^\|  \+ Language: (\w+)/m

(Take note of the number of spaces)

For each match, you check which of the capture groups that are defined. Only one group will have any value for any single match.

If it is the first group, a new track has started. Discard any stored information about the previous track.
If it is the second group, the current track is of type subtitles.
If it is the third group, the language of this track is found.
Whenever you know the language of a track, and that it is of type subtitles, report it.

MizardX 2008-11-05 22:40:53

Answer 2

+7 A:

You need to make your regex non-greedy by changing this:

.*

To this:

.*?

Your regex is matching from the first occurence of Track type: subtitles to the last occurence of Language: (\w\w\w). Making it non-greedy will work because it matches as few characters as possible.

yjerem 2008-11-05 22:41:40

To Jeremy:wait....you're 16 and understand 'geedyness'?!....and 8 Nice Answer badges?!!!...dang!...whatever job you're doing you're not getting paid enough....start offshore/nearshore codeing...like yesterday! you'll make a tonne of cash before you're even out of school.

Keng 2008-11-06 16:43:18

Oh thank you, I have a few less grey hairs coming.

DaveShaw 2010-07-20 21:50:03

ansaurus

tags:

views:

answers:

Finding all occurrences of a sequence of chars when preceded by a specific string

related questions