tags:

views:

119

answers:

6

How would I extract the word 'wrestle' from the following:

type=weaksubj len=1 word1=wrestle pos1=verb stemmed1=y priorpolarity=negative

using a regular expression?

Thanks

A: 

You regex would be something like this

/.*word1=(\w+)/
Lex
This also doesn't work
NullUserException
If you edit your answer, it would be nice to comment about it. I was confused for a while why this wouldn't work. Though the starting `.*` is still pointless.
teukkam
A: 

Use: /word1=(\w+)/

Ruel
Yep, thanks about that. Edited. The non-greedy matching caused the regex to match a single character only. :P
Ruel
+3  A: 

Given the following regex...

/word1=(\w+)/

...$1 or whatever your first match is in your language will be wrestle.

Dave Pirotte
in python, what should it look like? thanks
James Eggers
I believe it's `result = re.match(pattern, string)`
Ruel
@James see my answer
NullUserException
@Ruel: You want `re.search()`, not `re.match()`. The latter always anchors the search to the start of the string.
Tim Pietzcker
thanks @NullUserException, it works :)
James Eggers
A: 

Assuming it is always separated by spaces

word1=([^ ]+)

Then you can get the value by the first group match.

BrunoLM
+5  A: 

The question is not very clear, but I guess this is what you are looking for:

word1=(\w+)

Your match will be in the 1st group. Here's some sample Python code:

import re
yourstring = 'type=weaksubj len=1 word1=wrestle pos1=verb stemmed1=y priorpolarity=negative'

m = re.search(r'word1=(\w+)', yourstring)
print m.group(1)

As seen on codepad. A more generalized solution:

import re
def get_attr(str, attr):
    m = re.search(attr + r'=(\w+)', str)
    return None if not m else m.group(1)

str = 'type=weaksubj len=1 word1=wrestle pos1=verb stemmed1=y priorpolarity=negative'

print get_attr(str, 'word1')  # wrestle
print get_attr(str, 'type')   # weaksubj
print get_attr(str, 'foo')    # None

Also available on codepad

NullUserException
thanks, that worked :)
James Eggers
Great answer. +1
Ruel
A: 

Maybe re is unnecessary when str.split looks like it will suffice:

>>> s = "type=weaksubj len=1 word1=wrestle pos1=verb stemmed1=y priorpolarity=negative"
>>> dd = dict(ss.split('=',1) for ss in s.split())
>>> dd['word1']
'wrestle'
Paul McGuire