views:

79

answers:

6

I want to parse the output of a serial monitoring program called Docklight (I highly recommend it) It outputs 'hexadecimal' strings: or a sequence of (two capital hex digits followed by a space). the corresponding regular expression is: ([0-9A-F]{2} )+ for example: '05 03 DA 4B 3F '

When program detects particular sequences of characters it places comments in the 'hexadecimal ' string. for example:

'05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'

comments are strings of the following format ' .+ ' (a sequence of characters preceded by a space and followed by a space)

I want to get rid of the comments. for example, the 'hexadecimal' string above filtered would be:

'05 03 04 01 0A 03 08 0B BD AF 0D 0A '

how do i go about doing this with A regular expression?

+2  A: 

You could try re.findall():

>>> a='05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> re.findall(r"\b[0-9A-F]{2}\b", a)
['05', '03', '04', '01', '0A', '03', '08', '0B', 'BD', 'AF', '0D', '0A']

The \b in the regular expression matches a "word boundary".

Of course, your input is ambiguous if the serial monitor inserts something like THIS BE THE HEADER.

Greg Hewgill
A: 

It might be easier to find all the hexadecimal numbers, assuming the inserted strings won't contain a match:

>>> data = '05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> import re
>>> pattern = re.compile("[0-9A-F]{2} ")
>>> "".join(pattern.findall(data))
'05 03 04 01 0A 03 08 0B BD AF AD 0D 0A '

Otherwise you could use the fact that the inserted strings are preceed by two spaces:

>>> data = '05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> re.sub("(  .*?)(?=( [0-9A-F]{2} |$))","",data)
'05 03 04 01 0A 03 08 0B BD AF 0D 0A'

This uses a look ahead to work out when the inserted string ends. It looks for either a hexadecimal string surround by spaces or the end of the source string.

Dave Webb
A: 

Using your regex

hexa = '([0-9A-F]{2} )+'
" ".join(re.findall(hexa, line))
Adam Schmideg
A: 

While you already received two answers that find you all hexadecimal numbers, here's the same with a direct regular expression that finds you all text that does not look like a hexadecimal number (assuming that's two letter/digits in uppercase / lowercase 0-9 and A-F range, followed by a space).

Something like this (sorry, I'm not a pythoneer, but you get the idea):

newstring = re.sub(r"[^ ]+(?<![0-9A-Fa-f ]{2}|^.)", "", yourstring)

It works by "looking back". It finds every consecutive non-space substring, then negatively looks back with (?<!....). It says: "if the previous two characters were not a hex number, then succeed". The little ^. at the end prevents to incorrectly match the first character of the string.

Edit

As suggested by Alan Moore, here's the same idea with a positive lookahead expression:

newstring = re.sub(r"(?>\b[0-9A-Fa-f ]{2}\b)", "", yourstring)
Abel
It's usually easier (and probably more efficient) to use lookaheads for this sort of thing.
Alan Moore
@Alan, indeed it probably is, but you quicker get false positives. In this case, using the word boundary, it works. I updated the answer.
Abel
A: 

Why regexp? More pythonic for me is (fixed for hexdigit not regular digit):

command='05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
print ' '.join(com for com in command.split()
               if len(com)==2 and all(c.upper() in '0123456789ABCDEF' for c in com))
Tony Veijalainen
A: 

How about a solution that actually uses regex negation? ;)

result = re.sub(r"[ ]+(?:(?!\b[0-9A-F]{2}\b).)+", "", subject)
Alan Moore
also nice! Quiz: which is uglier, your regex negation or my negative look-behind? lol
Abel
It's too close to call, but if you add word boundaries to yours (which I think you should do), you'll be the clear winner. Or should I say the *unclear* winner? ;)
Alan Moore
Your wish is my command, I added them to the lookahead version ;)
Abel