ansaurus

Question

Regex: How to make a group for each word in a sentence?

Answer 1

+3 A:

Why use a regex when string.split does the same thing?

>>> "The quick brown fox".split()
['The', 'quick', 'brown', 'fox']

Mark Rushakoff 2010-07-08 03:19:55

Mainly because my use case is slightly more complex and it seems Regex would be the best fit for it.What I'm actually trying to do is get each instance of test1, test2, test3, etc. out of a string like such:>>> 1 0 5 test1 5 test2 5 test3 5 test4 5 test5where ("x testn") could be repeated any number of times, "x" is the number of characters in "testn", and the "1 0 " at the front is useless junk.

blah238 2010-07-08 03:27:37

Answer 2

+1 A:

Regular expressions can't group into unknown number of groups. But there is hope in your case. Look into the 'split' method, it should help in your case.

Vlad 2010-07-08 03:21:10

Answer 3

+3 A:

I don't believe that it is possible. Regexes pair the captures with the parentheses in the given regular expression... if you only listed on group, like '((\w+)\s+){0,99}', then it would just repeatedly capture to the same first and second group... not create new groups for each match found.

You could use split, but that only splits on one character value, not a class of characters like whitespace.

Instead, you can use re.split, which can split on a regular expression, and give it '\s' to match any whitespace. You probably want it to match '\s+' to gather the whitespace greedily.

>>> import re
>>> help(re.split)
Help on function split in module re:

split(pattern, string, maxsplit=0)
    Split the source string by the occurrences of the pattern,
    returning a list containing the resulting substrings.

>>> re.split('\s+', 'The   quick brown\t fox')
['The', 'quick', 'brown', 'fox']
>>>

Mark Santesson 2010-07-08 03:23:42

Thanks, that is more or less as I had concluded as well.

blah238 2010-07-08 03:36:01

Answer 4

+4 A:

You can also use the function findall in the module re

import re
>>> re.findall("\w+", "The quick brown fox")
['The', 'quick', 'brown', 'fox']

razpeitia 2010-07-08 03:25:17

ansaurus

tags:

views:

answers:

Regex: How to make a group for each word in a sentence?

related questions