ansaurus

Question

Python: hexadecimal regular expression question

Answer 1

+2 A:

You could try re.findall():

>>> a='05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> re.findall(r"\b[0-9A-F]{2}\b", a)
['05', '03', '04', '01', '0A', '03', '08', '0B', 'BD', 'AF', '0D', '0A']

The \b in the regular expression matches a "word boundary".

Of course, your input is ambiguous if the serial monitor inserts something like THIS BE THE HEADER.

Greg Hewgill 2010-08-03 10:36:17

Answer 2

A:

It might be easier to find all the hexadecimal numbers, assuming the inserted strings won't contain a match:

>>> data = '05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> import re
>>> pattern = re.compile("[0-9A-F]{2} ")
>>> "".join(pattern.findall(data))
'05 03 04 01 0A 03 08 0B BD AF AD 0D 0A '

Otherwise you could use the fact that the inserted strings are preceed by two spaces:

>>> data = '05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> re.sub("(  .*?)(?=( [0-9A-F]{2} |$))","",data)
'05 03 04 01 0A 03 08 0B BD AF 0D 0A'

This uses a look ahead to work out when the inserted string ends. It looks for either a hexadecimal string surround by spaces or the end of the source string.

Dave Webb 2010-08-03 10:37:55

Answer 3

A:

Using your regex

hexa = '([0-9A-F]{2} )+'
" ".join(re.findall(hexa, line))

Adam Schmideg 2010-08-03 10:46:25

Answer 4

A:

While you already received two answers that find you all hexadecimal numbers, here's the same with a direct regular expression that finds you all text that does not look like a hexadecimal number (assuming that's two letter/digits in uppercase / lowercase 0-9 and A-F range, followed by a space).

Something like this (sorry, I'm not a pythoneer, but you get the idea):

newstring = re.sub(r"[^ ]+(?<![0-9A-Fa-f ]{2}|^.)", "", yourstring)

It works by "looking back". It finds every consecutive non-space substring, then negatively looks back with (?<!....). It says: "if the previous two characters were not a hex number, then succeed". The little ^. at the end prevents to incorrectly match the first character of the string.

Edit

As suggested by Alan Moore, here's the same idea with a positive lookahead expression:

newstring = re.sub(r"(?>\b[0-9A-Fa-f ]{2}\b)", "", yourstring)

Abel 2010-08-03 10:47:35

It's usually easier (and probably more efficient) to use lookaheads for this sort of thing.

Alan Moore 2010-08-03 11:00:25

@Alan, indeed it probably is, but you quicker get false positives. In this case, using the word boundary, it works. I updated the answer.

Abel 2010-08-03 11:06:05

Answer 5

A:

Why regexp? More pythonic for me is (fixed for hexdigit not regular digit):

command='05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
print ' '.join(com for com in command.split()
               if len(com)==2 and all(c.upper() in '0123456789ABCDEF' for c in com))

Tony Veijalainen 2010-08-03 10:53:38

Answer 6

A:

How about a solution that actually uses regex negation? ;)

result = re.sub(r"[ ]+(?:(?!\b[0-9A-F]{2}\b).)+", "", subject)

Alan Moore 2010-08-03 10:53:58

also nice! Quiz: which is uglier, your regex negation or my negative look-behind? lol

Abel 2010-08-03 11:00:08

It's too close to call, but if you add word boundaries to yours (which I think you should do), you'll be the clear winner. Or should I say the *unclear* winner? ;)

Alan Moore 2010-08-03 11:04:21

Your wish is my command, I added them to the lookahead version ;)

Abel 2010-08-03 11:24:10

ansaurus

tags:

views:

answers:

Python: hexadecimal regular expression question

Edit

related questions