ansaurus

Question

parsing words in string prefaced by 'password' with regex

Answer 1

A:

The ([a-zA-Z]*) regular subexpression does not accept digits, you might have meant ([a-zA-Z0-9]+) or another choice would be (\S+).

You have already used \s, are you aware of \S? Because you are using \s as the "delimiter" of your password token, you might as well be consistent and define the password as consisting of any characters which are not delimiters.

You could also simplify your regular expression overall as follows:

^(?:.*:password(\sis|:)\s(\S+)\s.*)*$

As pointed out by codaddict's analogy to PHP's preg_match_all, you also need to call re.findall. To do so, you will need to change the regular expression to one which is not overlapping, such as:

password(\sis|:)\s(\S+)

and then you will receive in the return value from re.findall() a list of matches, each consisting of a list of groups matched.

Heath Hunnicutt 2010-03-10 23:46:31

Why are you talking to someone whose avatar picture you don’t like anyway?

Gumbo 2010-03-10 23:48:34

How am I going to convince this person who had score 1 to pick a more programming-related avatar? By interacting with them.

Heath Hunnicutt 2010-03-10 23:50:49

Heath Hunnicutt,I'm sorry, Stackoverflow must have auto-assigned my gravatar Avatar.I'm sorry (again), perhaps my question was not clear, i will try to explain it again.this is the full python code i have so far (the password is just a example). data=" Hello Mars password: WORLD random words password: HELLO python"match=re.match("REGEX",data)if match<>None: print match.groups()i want match.groups() to print the tuple ('WORLD','HELLO') by just using a Regexi know about \S \s and + however these do not add additional matches. only the last match is placed in the tuple.

Nick Hermans 2010-03-10 23:58:24

Lol, I'm sorry, I was being rude anyway. But if stack overflow gave you that Avatar, then I am sorry for them.codaddict makes an additional point that you have to call re.findall() rather than re.match(), which you also need to do. You will also need to modify your regexp to find non-overlapping matches.

Heath Hunnicutt 2010-03-11 00:06:55

Answer 2

A:

I think you'll have to match the first occurrence and then continue matching possible more occurrences using the global matching feature of Pyhon (not sure how to do it, I know very little Python)

In PHP for example we can use a preg_match_all to solve this:

$a="aaaaaa password: GoD hello world password is G0D hello";
if(preg_match_all('/.*?(?:password\sis\s|password:\s)(\w+)/',$a,$matches)) {
    var_dump($matches[1]); // prints God and GOD
}

codaddict 2010-03-10 23:58:23

right, but you changed the regexp to use \w as I suggested in my community wiki answer...The problem is his regexp does not accept zero and he is also mis-reading his output to see a zero where the letter after N is printed.

Heath Hunnicutt 2010-03-11 00:01:54

You have a point, though -- in python, the equivalent is called re.findall()

Heath Hunnicutt 2010-03-11 00:06:30

Answer 3

+1 A:

I'd use re.findall, and simplify the regex a bit.

>>> re.findall(r"(?:password\sis\s+|password\:\s+)(\S+)", a)
['GOD', 'G0D']

Edit: Changed from \w to \S in order to also capture punctuation, and remove list expression.

Ryan Ginstrom 2010-03-11 00:14:58

Thank you very much Ryan, i tried findall several time but it always resulted in returning a single string. however the way you use it results in returning the correct value's.the messy regex was the result of hours of trial and error trying to find the solution of my problem, I'm sorry about that. thanks allot for everyone helping me out.

Nick Hermans 2010-03-11 00:25:45

i found that re.findall(r"(?:password\sis\s+|password\:\s+)(\S+)", a) suited me better. in the long run. since i don't actually need the matches of 'password is' and 'password:' and also negates the need for the loop

Nick Hermans 2010-03-11 10:50:18

Nice point, I've edited my reply to reflect that.

Ryan Ginstrom 2010-03-11 16:05:49

ansaurus

tags:

views:

answers:

parsing words in string prefaced by 'password' with regex

related questions