ansaurus

Question

How to swap adjacent bytes in a string of hex bytes (with or without regular expressions)

Answer 1

+2 A:

import re
re.sub(r'(..)(..)', r'\2\1', '12345678')
re.sub(r'(....)(....)', r'\2\1', '34127856')

Chris Jester-Young 2010-01-19 00:49:38

Answer 2

+3 A:

A regex approach:

import re
twopairs = re.compile(r'(..)(..)')
stringwithswappedwords = twopairs.sub(r'\2\1', basestring)
twoquads = re.compile(r'(....)(....)')
stringwithswappedlongs = twoquads.sub(r'\2\1', stringwithswappedwords)

Edit: However, this is definitely not the fastest approach in Python -- here's how one finds out about such things: first, write all "competing" approaches into a module, here I'm calling it 'swa.py'...:

import re

twopairs = re.compile(r'(..)(..)')
twoquads = re.compile(r'(....)(....)')

def withre(basestring, twopairs=twopairs, twoquads=twoquads):
  stringwithswappedwords = twopairs.sub(r'\2\1', basestring)
  return twoquads.sub(r'\2\1', stringwithswappedwords)

def withoutre(basestring):
  asalist = list(basestring)
  asalist.reverse()
  for i in range(0, len(asalist), 2):
    asalist[i+1], asalist[i] = asalist[i], asalist[i+1]
  return ''.join(asalist)

s = '12345678'
print withre(s)
print withoutre(s)

Note that I set s and try out the two approaches for a fast sanity check that they're actually computing the same result -- good practice, in general, for this kind of "head to head performance races"!

Then, at the shell prompt, you use timeit, as follows:

$ python -mtimeit -s'import swa' 'swa.withre(swa.s)'
78563412
78563412
10000 loops, best of 3: 42.2 usec per loop
$ python -mtimeit -s'import swa' 'swa.withoutre(swa.s)'
78563412
78563412
100000 loops, best of 3: 9.84 usec per loop

...and you find that in this case the RE-less approach is about 4 times faster, a worthwhile optimization. Once you have such a "measurement harness" in place, it's also easy to experiment with further alternative and tweaks for further optimization, if there is any need for "really blazing speed" in this operation, of course.

Edit: for example, here's an even faster approach (add to the same swa.py, with a final line of print faster(s) of course;-):

def faster(basestring):
  asal = [basestring[i:i+2]
          for i in range(0, len(basestring), 2)]
  asal.reverse()
  return ''.join(asal)

This gives:

$ python -mtimeit -s'import swa' 'swa.faster(swa.s)'
78563412
78563412
78563412
100000 loops, best of 3: 5.58 usec per loop

About 5.6 microseconds, down from about 9.8 for the simplest RE-less approach, is another possibly-worthwhile micro-optimization.

And so on, of course -- there's an old folk (pseudo)theorem that says that any program can be made at least one byte shorter and at least one nanosecond faster...;-)

Edit: and to "prove" the pseudotheorem, here's a completely different approach (replace the end of swa.py)...:

import array
def witharray(basestring):
  a2 = array.array('H', basestring)
  a2.reverse()
  return a2.tostring()

s = '12345678'
# print withre(s)
# print withoutre(s)
print faster(s)
print witharray(s)

This gives:

$ python -mtimeit -s'import swa' 'swa.witharray(swa.s)'
78563412
78563412
100000 loops, best of 3: 3.01 usec per loop

for a further possible-worthy speedup.

Alex Martelli 2010-01-19 00:55:48

Well done Alex. Once a programmer breaks out of the hall of mirrors that is regular expressions, they can do great things... Personally I want all my software to be over seven times quicker than my competitors!

logout 2010-01-19 01:35:14

@cudamaru, the latest version is 14 times faster than what I got with REs, and over 3 times faster what I first got without them. Being _hypnotized_ by REs' convenience to the point of not considering alternatives would be surely damaging. However don't neglect the possibility that (in other cases) the speed advantages might be exactly the other way 'round -- always consider and try various sensible possibilities when speed really matters!-)

Alex Martelli 2010-01-19 01:38:59

@Alex. Outstanding work there. More ammunition for my anti-RE campaign ;-)

logout 2010-01-20 12:41:36

Answer 3

A:

>>> import re
>>> re.sub("(..)(..)","\\2\\1","12345678")
'34127856'
>>> re.sub("(....)(....)","\\2\\1","34127856")
'78563412'

S.Mark 2010-01-19 00:56:07

Answer 4

A:

If you want to do endianness conversion, use Python's struct module on the original binary data.

If that is not your goal, here's a simple sample code to rearrange one 8 character string:

def wordpairswapper(s):
    return s[6:8] + s[4:6] + s[2:4] + s[0:2]

Bandi-T 2010-01-19 00:58:14

Answer 5

+1 A:

just for the string "12345678"

from textwrap import wrap
s="12345678"
t=wrap(s,len(s)/2)
a,b=wrap(t[0],len(t[0])/2)
c,d=wrap(t[1],len(t[1])/2)
a,b=b,a
c,d=d,c
print a+b+c+d

you can make it to a generic function to do variable length string.

output

$ ./python.py
34127856

ghostdog74 2010-01-19 00:59:04

Answer 6

A:

Thanks to all the replies.

If I'm not mistaken, withoutre() produces '21436587' when I run it, as opposed to

"34127856" which is my original desired result. I want to swap "byte pairs", not the order of the bytes.

pyNewGuy 2010-01-19 20:40:05

ansaurus

tags:

views:

answers:

How to swap adjacent bytes in a string of hex bytes (with or without regular expressions)

related questions