views:

122

answers:

4

if I have a string in the format of

(static string) name (different static string ) message (last static string)

(static string) name (different static string ) message (last static string)

(static string) name (different static string ) message (last static string)

(static string) name (different static string ) message (last static string)

what would be the best way of searching through the messages for word and generate an array of all of the name's that had that word in their message?

A: 

Expecting this string:

Foo NameA Bar MessageA Baz

this regex will match:

Foo\s+(\w+)\s+Bar\s+(\w+)\s+Baz

Group 1 will be the name, group 2 will be the message. FooBarBaz are the static parts.

Here it is using the repl of Python:

Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = "Foo NameA Bar MessageA Baz"
>>> m = re.match("Foo\s+(\w+)\s+Bar\s+(\w+)\s+Baz", s)
>>> m.group(0)
'Foo NameA Bar MessageA Baz'
>>> m.group(1)
'NameA'
>>> m.group(2)
'MessageA'
>>> 
michael.kebe
+3  A: 
>>> s="(static string) name (different static string ) message (last static string)"
>>> _,_,s=s.partition("(static string)")
>>> name,_,s=s.partition("(different static string )")
>>> message,_,s=s.partition("(last static string)")
>>> name
' name '
>>> message
' message '
gnibbler
This is better than using Regular Expressions, because you should only use regexes when you have complicated pattern matching that cannot be easily done with other string operations. Check the methods of the string module before using regexes.
Michael Dillon
A: 

Here's a full answer showing how to do it using replace().

strings = ['(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)',
           '(static string) name (different static string ) message (last static string)']

results = []
target_word = 'message'
separators = ['(static string)', '(different static string )', '(last static string)']

for s in strings:
    for sep in separators:
        s = s.replace(sep, '')
    name, message = s.split()
    if target_word in message:
        results.append((name, message))

>>> results
[('name', 'message'), ('name', 'message'), ('name', 'message'), ('name', 'message'), ('name', 'message'), ('name', 'message')]

Note that this will match any message that contains the substring target_word. It will not look for word boundaries, e.g. compare a run of this with target_word = 'message' vs. target_word = 'sag' - will produce the same results. You may need regular expressions if your word matching is more complicated.

mhawke
A: 
for line in open("file"):
    line=line.split(")")
    for item in line:
        try:
            print item[:item.index("(")]
        except:pass

output

$ more file
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
(static string) name (different static string ) message (last static string)
$ python python.py

 name
 message

 name
 message

 name
 message

 name
 message
ghostdog74