views:

1306

answers:

3

Hello,

I need to byte-shift a text file. I know absolutely nothing about perl, but I found a perfectly working piece of code in perl called moz-byteshift.pl (documentation). This does exactly what I want to do, but I need to do it in C#.

Here's the source code of the perl file:

#!/usr/bin/perl

# To perform a byteshift of 7
#   To decode: moz-byteshift.pl -s -7 <infile >outfile
#   To encode: moz-byteshift.pl -s  7 <infile >outfile

# To perform a byteshift of 13
#   To decode: moz-byteshift.pl -s -13 <infile >outfile
#   To encode: moz-byteshift.pl -s  13 <infile >outfile

use encoding 'latin1';
use strict;
use Getopt::Std;

use vars qw/$opt_s/;

getopts("s:");
if(!defined $opt_s) {
  die "Missing shift\n";
}

my $buffer;
while(1) {
  binmode(STDIN, ":raw");
  my $n=sysread STDIN, $buffer, 1;
  if($n == 0) {
    last;
  }
  my $byte = unpack("c", $buffer);
  $byte += 512 + $opt_s;
  $buffer = pack("c", $byte);
  binmode(STDOUT, ":raw");
  syswrite STDOUT, $buffer, 1;
}

If someone could at least explain how the perl script works, that would be great. Sample code of the equivalent in C# would be better. =)

Thanks for the help.

+4  A: 

There's not much to tell. It reads a file one byte at a time, adjusts the value of each byte by an arbitrary value (specified via the -s flag), and writes out the adjusted bytes. It's the binary equivalent of ROT-13 encryption of a text file.

The rest of the details are specific to how Perl does those things. getopts() is a function (from the Getopt::Std module) that processes command-line switches. binmode() puts the filehandles in raw mode to bypass any of the magic that Perl normally does during I/O. The sysread() and syswrite() functions are used for low-level stream access. The pack() and unpack() functions are used to read and write binary data; Perl doesn't do native types.

This would be trivial to re-implement in C. I'd recommend doing that (and binding to it from C# if need be) rather than porting to C# directly.

Michael Carman
Thanks. That is helpful. I guess the part I don't understand is what type of shifting it does. Does it take a byte array like this: byte[] {1,2,3,4,5} and (shifted by one) produce this: byte[] {5,1,2,3,4}?Or does it shift the bits of each byte, turning: byte[]{00000001,00000010,00000011} into (shifting by one): byte[] {10000000,00000001,10000001}?
Andrew
Calling this a "shift" is kind of a misnomer. It doesn't move bits or bytes. It applies an offset to the value of each byte. If your original data had byte values of 1, 2, 3 and you specified "-s 5" the result would be 6, 7, 8.
Michael Carman
So it adds to the byte value? So with a shift of 1, 00000001 becomes 00000010, 00001000 becomes 00001001, and so on?
Andrew
@Andrew: That's right. Note also that the values wrap around. i.e. 0xFE + 0x04 = 0x02. This makes the transformation reversible.
Michael Carman
Thanks - that's exactly the explanation I needed.
Andrew
+1  A: 

What the code does is this: Read each byte from standard input one by one (after switching it into raw mode so no translation occurs). The unpack gets the byte value of the character just read so that a '0' read turns into 0x30. The latin1 encoding is selected so that this conversion is consistent (e.g. see http://www.cs.tut.fi/~jkorpela/latin9.html).

Then the value specified on the command line with the -s option is added to this byte along with 512 to simulate a modulus operation. This way, -s 0, -s 256 etc are equivalent. I am not sure why this is needed because I would have assumed the following pack took care of that but I think they must have had good reason to put it in there.

Then, write the raw byte out to standard input.

Here is what happens when you run it on a file containing the characters 012345 (I put the data in the DATA section):

E:\Test> byteshift.pl -s 1 | xxd
0000000: 3132 3334 3536 0b                        123456.

Each byte value is incremented by one.

E:\Test> byteshift.pl -s 257 | xxd
0000000: 3132 3334 3536 0b                        123456.

Remember 257 % 256 = 1. That is:

$byte += $opt_s;
$byte %= 256;

is equivalent to the single step used in the code.

Much later: OK, I do not know C# but here is what I was able to piece together using online documentation. Someone who knows C# should fix this:

using System;
using System.IO;

class BinaryRW {
    static void Main(string[] args) {
        BinaryWriter binWriter = new BinaryWriter(
                Console.OpenStandardOutput()
                );
        BinaryReader binReader = new BinaryReader(
                Console.OpenStandardInput()
                );

        int delta;

        if ( args.Length < 1 
                || ! int.TryParse( args[0], out delta ) )
        {
            Console.WriteLine(
                    "Provide a non-negative delta on the command line"
                    );
        } 
        else {       
            try  {
                while ( true ) {
                    int bin = binReader.ReadByte();
                    byte bout = (byte) ( ( bin + delta ) % 256 );
                    binWriter.Write( bout );
                }
            }

            catch(EndOfStreamException) { }

            catch(ObjectDisposedException) { }

            catch(IOException e) {
                Console.WriteLine( e );        
            }

            finally {
                binWriter.Close();
                binReader.Close();

            }
        }
    }
}

E:\Test> xxd bin
0000000: 3031 3233 3435 0d0a 0d0a                 012345....

E:\Test> b 0 < bin | xxd
0000000: 3031 3233 3435 0d0a 0d0a                 012345....

E:\Test> b 32 < bin | xxd
0000000: 5051 5253 5455 2d2a 2d2a                 PQRSTU-*-*

E:\Test> b 257 < bin | xxd
0000000: 3132 3334 3536 0e0b 0e0b                 123456....
Sinan Ünür
I think the 512 is supposed to be a bias to force the value to wrap instead of saturating. I don't think it's necessary, though (at least not in Perl).
Michael Carman
Thank you! That works perfectly. I'm not going to be using this from the command line, but for others that find this question, there is one bug in your code:You should add `args.Length < 1 || ` to the beginning of your if condition to avoid an "index out of bounds" exception when nothing is entered.
Andrew
Thanks for catching that.
Sinan Ünür
Why are you trapping delta < 0? That makes the transformation not (easily) reversible. It can be negative in the original code.
Michael Carman
Just mental error, I guess. I was focused on getting the syntax right so the program would compile.
Sinan Ünür
+1  A: 

Judging by the other answers the equivalent in C# would look something like this:

using(Stream sIn = new FileStream(inPath))
{
  using(Stream sOut = new FileStream(outPath))
  {
    int b = sIn.ReadByte();
    while(b >= 0)
    {
      b = (byte)b+1; // or some other value
      sOut.WriteByte((byte)b);
      b = sIn.ReadByte();
    }
    sOut.Close();
  }
  sIn.Close();
}
samjudson
ReadByte returns the value of the byte, or -1 if the end of the stream is reached, so you comment makes no sense.
samjudson
According to http://msdn.microsoft.com/en-us/library/system.io.binaryreader.readbyte.aspxthe return value of ReadByte is of type System.Byte. According tohttp://msdn.microsoft.com/en-us/library/system.byte.aspx System.Byte"Represents an 8-bit unsigned integer." There is no mention of ReadByte returning -1 if the end of stream is reached. In fact, a simple test program based on what you wrote above crashed with System.IO.EndOfStreamException.
Sinan Ünür
Well I'm not calling BinaryReader.ReadByte am I, I'm calling Stream.ReadByte. Check the docs: http://msdn.microsoft.com/en-us/library/system.io.stream.readbyte.aspx
samjudson
D'uh! Sorry about that.
Sinan Ünür