tags:

views:

89

answers:

5

I reading though a library of python code, and I'm stumped by this statement:

struct.pack( "<ii%ds"%len(value), ParameterTypes.String, len(value), value.encode("UTF8") )

I understand everything but%d, and I'm not sure why the length of value is being packed in twice.

As I understand it, the structure will have little endian encoding (<) and will contain two integers (ii) followed by %d, followed by a string (s).

What is the significance of %d?

+1  A: 

It is an ordinary string format which is being used to create the struct format

Try reading it to begin with as an ordinary string (forget struct for the moment) ...

"<ii%ds" % len(value)

If, for example, the length of the value iterable is 4 then the string will be, <ii4s. This is then passed to struct.pack ready to pack two integers followed by a string of length four bytes from the value iterable

Brendan
Thank you! I'm not familiar with python, so I didn't realise that you can format strings like that.
Matt Ellen
@Matt Ellen: Please find a tutorial.
S.Lott
@Brendan: -1; see my answer.
John Machin
Ok, I've updated it - now reads 'iterable'
Brendan
@Brendan: Not OK; it's NOT any old iterable, it's a unicode string, and "4 strings from the value iterable" is nonsense when the struct.pack "4s" format is expecting 4 **bytes**
John Machin
Ah ok, I didn't realise that the string modifier was for the number of bytes in the string, not the number of strings - I have only used `struct` with floats and ints.
Brendan
A: 

The %d means this works in two steps.

Step 1.

"<ii%ds"%len(value) 

Creates the struct formatting string of "<ii...some number...s".

Step 2.

The resulting formatting string is applied to three values

ParameterTypes.String, len(value), value.encode("UTF8")
S.Lott
-1 See my answer.
John Machin
@S.Lott: -1. Don't think; investigate. Without a number means merely that the number defaults to 1. Tends to pack correctly??? Perhaps you think that struct.pack("s", foo) works the same way as "%s" % foo? It doesn't; docs say """For the 's' format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. For packing, the string is truncated or padded with null bytes as appropriate to make it fit."""
John Machin
@John Machin: Since I revised my answer, I'm not sure any of this is still relevant. But thanks for pointing out my egregious error.
S.Lott
A: 

It's used to specify that a string (value) of len(value) characters is to be packed after those two integers.

If, for instance, value contained "boo" then the actual format specifier for pack would be "<ii3s".

Michael Foukarakis
A: 

The significance of %d is that it's a formatting parameter for strings:
String Formatting Operations

When broken apart, "<ii%ds" % len(value) is a bit easier to understand. It is replacing the %d conversion indicator in the string with the return value of len(value), typecast appropriately.

>>> str = "<ii%ds"
>>> str % 5
'<ii5s'
>>> str % 3
'<ii3s'
Andrew
+1  A: 

Aarrrgh the mind boggles ....

@S.Lott: """I don't think the number is particularly important, since Python will tend to pack correctly without it.""" -1. Don't think; investigate. Without a number means merely that the number defaults to 1. Tends to pack correctly??? Perhaps you think that struct.pack("s", foo) works the same way as "%s" % foo? It doesn't; docs say """For the 's' format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. For packing, the string is truncated or padded with null bytes as appropriate to make it fit."""

@Brendan: -1. value is not an array (whatever that is); it is patently obviously intended to be a unicode string ... lookee here: value.encode("UTF8")

@Matt Ellen: The line of code that you quote is severely broken. If there are any non-ASCII characters in value, data will be lost.

Let's break it down:

`struct.pack("<ii%ds"%len(value), ParameterTypes.String, len(value), value.encode("UTF8"))`  

Reduce problem space by removing the first item

struct.pack("<i%ds"%len(value), len(value), value.encode("UTF8"))

Now let's suppose that value is u'\xff\xff', so len(value) is 2.

Let v8 = value.encode('UTF8') i.e. '\xc3\xbf\xc3\xbf'.

Note that len(v8) is 4. Is the penny dropping yet?

So what we now have is

struct.pack("<i2s", 2, v8)

The number 2 is packed as 4 bytes, 02 00 00 00. The 4-byte string v8 is TRUNCATED (by the length 2 in "2s") to length two. DATA LOSS. FAIL.

The correct way to do what is presumably wanted is:

v8 = value.encode('UTF8')
struct.pack("<ii%ds" % len(v8), ParameterTypes.String, len(v8), v8)
John Machin
Thanks John. This is very informative. I'll pass it onto the person who wrote the library. (My boss, *gulp*)
Matt Ellen