views:

449

answers:

6

First, a disclaimer. I'm not a CS grad nor a math major, so simplicity is important.

I have a four-character string (e.g. "isoy") that I need to pass as a single 32-bit integer field. Of course at the other end, I need to decode it back to a string. The string will only contain A-Z, and case is not important, if that helps.

The funny part is that I'm starting with PowerShell on the sending end and Linux at the receiving end. I can use Perl or Python there, with a preference for Python. I don't actually need answers in each language, I'm most interested in a PowerShell (C# also good) example for going both ways.

+1  A: 
// string -> int    

uint ret = 0;
for ( int i = 0; i < 4; ++i )
{
  ret |= ( str[i] << ( i * 8 ) );
}

// int -> string
for ( int i = 0; i < 4; ++i )
{
  str[i] = ( ret >> ( i * 8 ) ) & 0xff;
}
Evän Vrooksövich
+8  A: 

For Python, struct.unpack does the job (to make a 4-byte string into an int -- struct.pack goes the other way):

>>> import struct
>>> struct.unpack('i', 'isoy')[0]
2037347177
>>> struct.pack('i', 2037347177)
'isoy'
>>>

(you can use different formats to ensure big-endian or little-endian encoding, if you need that -- '>i' and '<i' respectively -- instead of just plain 'i' which uses whatever encoding is native to the machine).

Alex Martelli
Oh cool, it gives the same answer as the C# sample. :)
halr9000
+1  A: 

Please take a look at the struct standard library module in Python's Manual. It has two functions for this: struct.pack and struct.unpack. You can use the 'L' (unsigned long) format character for this.

fviktor
+6  A: 

To 32-bit unsigned integer:

uint x = BitConverter.ToUInt32(Encoding.ASCII.GetBytes("isoy"), 0); // 2037347177

To string:

string s = Encoding.ASCII.GetString(BitConverter.GetBytes(x));      // "isoy"

BitConverter uses the native endianness of the machine.

dtb
$s = "isoy"; $x = [BitConverter]::ToUInt32([Text.Encoding]::ASCII.GetBytes($s),0); $s = [Text.Encoding]::ASCII.GetString([BitConverter]::GetBytes($x));
Jaykul
+1  A: 

Aside from byte packing, you can also consider that your 26-character alphabet can be encoded as 0-25 instead of A-Z.

So without worrying about big and little endians, you can go from "letters" to a number like this:

val=letter0+letter1*26+letter2*26*26+letter3*26*26*26;

to go from val back to letters, you do something like this:

letter0=val%26;
letter1=(val/26)%26;
letter2=(val/(26*26))%26;
letter3=(val/(26*26*26))%26;

where "%" is your language's modulus operator and "/" is an integer division.

You'll obviously need a way to get from 'A'-'Z' to 0-25 and back. That's language dependent.

You can easily put this into loops. I show the loops unrolled to make things a bit more obvious.

It's more common to pack letters into bytes, so you can use shift and and bitwise operations to encode and decode. But by doing it the way I show above, you could pack six letters into a 32-bit number, rather than just four. Which is nice, since you can hold things like stock market ticker symbols in a single 32-bit value (mutual funds ticker symbols are 5 characters).

Nosredna
+2  A: 

Using PowerShell syntax you can do it this way (pretty much like dtb solution):

PS> $x = [BitConverter]::ToUInt32([byte[]][char[]]'isoy', 0)
PS> [char[]][BitConverter]::GetBytes($x) -join ''
isoy

You do have to watch out for endian-ness on the Linux side. If it is running on an Intel processor I believe should be fine (same endian-ness as the PowerShell side).

Keith Hill
Wow, power shell syntax is heinous.
twneale