ansaurus

Question

Answer 1

+8 A:

Use File.ReadAllBytes to load the PDF file, and then encode the byte array as normal using Convert.ToBase64String(bytes).

Andrew Rollings 2009-01-24 03:23:15

Yikes! That could get big.

Jeffrey Hantin 2009-01-24 03:34:43

Indeed.But these days machines have a lot of memory. And if necessary, reading buffered blocks from a file is a pretty standard technique :)

Andrew Rollings 2009-01-24 04:52:19

Works great for what I need for the moment. Thanks for the tip!

Tone 2009-01-26 06:22:13

this is very wasteful of memory. a stream based approach would be better. the crypto based approach suggested by JMarsch is likely more efficient. you could also do it by reading a small number of bytes at a time (multiples of 3, I would guess) and encoding them independently, writing them to the stream where you need them.

Sebastian Good 2010-02-12 20:42:56

See my previous comment. It's not hard to buffer it.

Andrew Rollings 2010-02-13 14:43:45

Also, the KISS principle applies. Why make a solution more complex than it needs to be. If the above suits his purpose (which he says it does) then why make it more complex? 2 lines of c# versus 30?

Andrew Rollings 2010-02-16 23:07:27

Answer 2

+7 A:

There is a way that you can do this in chunks so that you don't have to burn a ton of memory all at once.

.Net includes an encoder that can do the chunking, but it's in kind of a weird place. They put it in the System.Security.Cryptography namespace.

I have tested the example code below, and I get identical output using either my method or Andrew's method above.

Here's how it works: You fire up a class called a CryptoStream. This is kind of an adapter that plugs into another stream. You plug a class called CryptoTransform into the CryptoStream (which in turn is attached to your file/memory/network stream) and it performs data transformations on the data while it's being read from or written to the stream.

Normally, the transformation is encryption/decryption, but .net includes ToBase64 and FromBase64 transformations as well, so we won't be encrypting, just encoding.

Here's the code. I included a (maybe poorly named) implementation of Andrew's suggestion so that you can compare the output.


    class Base64Encoder
    {
        public void Encode(string inFileName, string outFileName)
        {
            System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.ToBase64Transform();
            using(System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
                                      outFile = System.IO.File.Create(outFileName))
            using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(outFile, transform, System.Security.Cryptography.CryptoStreamMode.Write))
            {
                // I'm going to use a 4k buffer, tune this as needed
                byte[] buffer = new byte[4096];
                int bytesRead;

                while ((bytesRead = inFile.Read(buffer, 0, buffer.Length)) > 0)
                    cryptStream.Write(buffer, 0, bytesRead);

                cryptStream.FlushFinalBlock();
            }
        }

        public void Decode(string inFileName, string outFileName)
        {
            System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.FromBase64Transform();
            using (System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
                                      outFile = System.IO.File.Create(outFileName))
            using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(inFile, transform, System.Security.Cryptography.CryptoStreamMode.Read))
            {
                byte[] buffer = new byte[4096];
                int bytesRead;

                while ((bytesRead = cryptStream.Read(buffer, 0, buffer.Length)) > 0)
                    outFile.Write(buffer, 0, bytesRead);

                outFile.Flush();
            }
        }

        // this version of Encode pulls everything into memory at once
        // you can compare the output of my Encode method above to the output of this one
        // the output should be identical, but the crytostream version
        // will use way less memory on a large file than this version.
        public void MemoryEncode(string inFileName, string outFileName)
        {
            byte[] bytes = System.IO.File.ReadAllBytes(inFileName);
            System.IO.File.WriteAllText(outFileName, System.Convert.ToBase64String(bytes));
        }
    }

I am also playing around with where I attach the CryptoStream. In the Encode method,I am attaching it to the output (writing) stream, so when I instance the CryptoStream, I use its Write() method.

When I read, I'm attaching it to the input (reading) stream, so I use the read method on the CryptoStream. It doesn't really matter which stream I attach it to. I just have to pass the appropriate Read or Write enumeration member to the CryptoStream's constructor.

JMarsch 2009-03-29 00:37:09

I haven't ran and verified this, but this looks promisingly good and awesome. Cool idea! +1

codingbear 2010-06-16 11:59:03

ansaurus

tags:

views:

answers:

Base64 Encode a PDF in C#?

related questions