tags:

views:

615

answers:

6

Hi,

I want to encrypt a string and embed it in a URL, so I want to make sure the encrypted output isn't bigger than the input.

Is AES the way to go?

+10  A: 

It's impossible to create any algorithm which will always create a smaller output than the input, but can reverse any output back to the input. If you allow "no bigger than the input" then basically you're just talking isomorphic algorithms where they're always the same size as the input. This is due to the pigeonhole principle.

Added to that, encryption usually has a little bit of padding (e.g. "to the nearest 8 bytes, rounded up" - in AES, that's 16 bytes). Oh, and on top of that you're got the issue of converting between text and binary. Encryption algorithms usually work in binary, but URLs are in text. Even if you assume ASCII, you could end up with an encrypted binary value which isn't ASCII. The simplest way of representing arbitrary binary data in text is to use base64. There are other alternatives which would be highly fiddly, but the general "convert text to binary, encrypt, convert binary to text" pattern is the simplest one.

Jon Skeet
Jon: for AES, it's rounded up to 16 bytes (128-bit blocks). Also, base-64 encoding that means around 22-characters per 16-character input block.
Stobor
@Stobor: Will edit for the 16 byte blocks for AES, cheers.
Jon Skeet
By the way, this ("impossible..., always creates a smaller output than the input") is called the "pigeonhole principle": http://en.wikipedia.org/wiki/Pigeonhole_principle
schnaader
@schnaaeder: Indeed, as per my comment to zebrabox's answer :) Will edit my answer to make that clear though.
Jon Skeet
@schaader + @jon Skeet - Thanks for the link
zebrabox
+2  A: 

Simple answer is no. Any symmetric encryption algorithm ( AES included ) will produce an output of at minimum the same but often slightly larger. As Jon Skeet points out, usually because of padding or alignment.
Of course you could compress your string using zlib and encrypt but you'd need to decompress after decrypting.
Disclaimer - compressing the string with zlib will not guarantee it comes out smaller though

zebrabox
That disclaimer is the important bit, IMO. Pigeon hole principle 'n all :)
Jon Skeet
A: 

If we are talking about symetric encription to obtain the original encrypted string from a cyphered one it is not possible. I think that unless you use hashes (SHA1, SHA256...) you will never obtain a cyphered string smaller than the original text. The problem with hashes is that they are not the solution for retrieving the original string because they are one way encryption algorithms.

backslash17
A: 

When using AES, the output data will be rounded up to have a specific length (e.g a length divisible trough 16).

If you want to transfer secret data to another website, a HTTP post may do better than embedding the data into the URL.

Emiswelt
+1  A: 

What matters is not really the cipher that you use, but the encryption mode that you use. For example the CTR mode has no length expansion, but every encryption needs a new distinct starting point for the counter. Other modes like OFB, CFB (or CBC with ciphertext stealing) also don't need to be padded to a multiple of the block length of the cipher, but they need an IV. It is unclear from your question if there is some information available from which an IV could be derived pseudorandomly an if any of these modes would be appropriate. It is also unclear if you need authentication, or if you need semantic security> i.e. is it a problem if you encrypt the same string twice and you get the same ciphertext twice?

Accipitridae
A: 

Also just another thing to clarify:

Not only is it true that symmetric encryption algorithms produce an output that is at least as large as the input, the same is true of asymmetric encryption.

"Asymmetric encryption" and "cryptographic hashes" are two different things.

Asymmetric encryption (e.g. RSA) means that given the output (i.e. the ciphertext), you can get the input (i.e. the plaintext) back if you have the right key, it's just that decrypting requires a different key than the key used for encrypting. For asymmetric encryption, the same "pigeonhole principle" argument applies.

Cryptographic hashes (e.g. SHA-1) mean that given the output (i.e. the hash) you can't get the input back, and you can't even find a different input that hashes to the same value (assuming the hash is secure). For cryptographic hashes, the hash can be shorter than the input. (In fact the hash is the same size regardless of the length of the input.


And also one more thing: In any secure encryption system the ciphertext will be longer than the plaintext. This is because there are multiple possible ciphertexts that any given plaintext could encrypt to (e.g. using different IVs.) If this were not the case then the cipher would leak information because if two identical plaintexts were encrypted, they would encrypt to identical ciphertexts, and an adversary would then know that the plaintexts were the same.

Alex319