views:

496

answers:

8

Hi,

I need an encryption scheme where the plaintext and ciphertext are composed entirely of decimal digits.

In addition, the plaintext and ciphertext must be the same length.

Also the underlying encryption algorithm should be an industry-standard. I don't mind if its symmetric (e.g AES) or asymmetric (e.g RSA) - but it must be a recognised algorithm for which I can get a FIPS-140 approved library. (Otherwise it won't get past the security review stage).

Using AES OFB is fine for preserving the length of hex-based input (i.e. where each byte has 256 possible values: 0x00 --> 0xFF). However, this will not work for my means as plaintext and ciphertext must be entirely decimal.

NB: "Entirely decimal" may be interpreted two ways - both of which are acceptable for my requirements:

  1. Input & output bytes are characters '0' --> '9' (i.e. byte values: 0x30 -> 0x39)
  2. Input & output bytes have the 100 (decimal) values: 0x00 --> 0x99 (i.e. BCD)

Some more info: The max plaintext & ciphertext length is likely to be 10 decimal digits. (I.e. 10 bytes if using '0'-->'9' or 5 bytes if using BCD)

Consider following sample to see why AES fails: Input string is 8 digit number. Max 8-digit number is: 99999999 In hex this is: 0x5f5e0ff

This could be treated as 4 bytes: <0x05><0xf5><0xe0><0xff>

If I use AES OFB, I will get 4 byte output.

Highest possible 4-byte ciphertext output is <0xFF><0xFF><0xFF><0xFF>

Converting this back to an integer gives: 4294967295 I.e. a 10-digit number.

==> Two digits too long.

One last thing - there is no limit on the length any keys / IVs required.

Thanks in advance for any suggestions you may have to address this.

David.

A: 

You could use the octal format, which uses digits 0-7, and three digits make up a byte. This isn't the most space-efficient solution, but it's quick and easy.

Example:

Text: Hello world!
Hexadecimal: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21
Octal: 110 145 154 154 157 040 167 157 162 154 144 041

(spaces added for clarity to separate bytes)

Joey Adams
Not sure how this would work.If input string is decimal and output string is octal *and* length must be preserved, then we will get collisions as the "ciphertext-space" is smaller than the "plaintext-space". I.e. there are not enough possible ciphertext strings to uniquely identify all possible inputs.
Oops, good point. Hence, you want a one-to-one mapping from decimal digits to decimal digits.
Joey Adams
A: 

I don't believe your requirement can be met (at all easily anyway), though it's possible to get pretty close close.

AES (like most encryption algorithms) is written to work with octets (i.e. 8-bit bytes), and it's going to produce 8-bit bytes. Once it's done its thing, converting the result to use only decimal digits or BCD values won't be possible. Therefore, your only choice is to convert the input from decimal or BCD digits to something that fills an octet as completely as possible. You can then encrypt that, and finally re-encode the output to use only decimal or BCD digits.

When you convert the ASCII digits to fill the octets, it'll "compress" the input somewhat. The encryption will then produce the same size of output as the input. You'll then encode that to use only decimal digits, which will expand it back to roughly the original size.

The problem is that neither 10 nor 100 is a number that you're going to easily fit exactly into a byte. Numbers from 1 to 100 can be encoded in 7 bits. So, you'll basically treat those as a bit-stream, putting them in 7 bits at a time, but taking them out 8 bits at a time to get bytes to encrypt.

That uses the space somewhat better, but it's still not perfect. 7 bits can encode values from 0 to 127, not just 0 to 99, so even though you'll use all 8 bits, you won't use every possible combination of those 8 bits. Likewise, in the result, one byte will turn into three decimal digits (0 to 255), which clearly wastes some space. As a result, your output will be slightly larger than your input.

To get closer than that, you could compress your input with something like Huffman or an LZ* compression (or both) before encrypting it. Then you'd do roughly the same thing: encrypt the bytes, and encode the bytes using values from 0 to 9 or 0 to 99. This will give better usage of the bits in the bytes you encrypt, so you'd waste very little space in that transformation, but does nothing to improve the encoding on the output side.

Jerry Coffin
Yes - this is the line of thinking I started off with. However, "close" just isn't going to be good enough. I need an encryption scheme that will be 100% reliable for all plaintext lengths - i.e. zero collisions and ciphertext same length as input.I feel it must be possible - as VeriFone have managed to do it:Their "VeriShield Protect" encrypts the credit-card number from card reader through to the acquiring bank. The encrypted card number looks just like a real card number (First 6 and last 4 are the same as the real card number, the middle digits are encrypted. They claim it's AES based).
Sorry, but the answer is rc4 or any stream cipher.
Rook
@Unknown:Quite the contrary. While a stream cipher does produce output the same size as the input, that output will use the full alphabet available. To use only digits, you would have to convert its output to another format, which would expand that output substantially.
Jerry Coffin
A: 

Using only 10 digits as input/output is completely insecure. It is so insecure, that in very likely that will be cracked in real application, so consider using at least 39 digits (128 bits equivalent) If you are going to use only 10 digits there is no point in using AES in this case you have chance to invent your own (insecure) algorithm.

The only way you might get out of this is using STREAM cipher. Use 256 bit key "SecureKey" and initialisation vector IV which should be different at each beginning of starting season. Translate this number into 77 digit (decimal) number and use "addition whit overflow" over modulo 10 whit each digit.

For instance

 AES(IV,KEY) =      4534670 //and lot more
 secret_message =   01235
                          + and mod 10 
 ---------------------------------------------
 ciphertext     =   46571 // and you still have 70 for next message

when you run out for digits from stream cipher -> AES(IV,KEY) increase IV and repeat IV=IV+1

Keep in mind that you should absolutely never use same IV twice, so you should have some scheme over this to prevent this.

Another concern is also in generating Streams. If you generate number that is higher than 10^77 you should discard that number increase IV and try again whit new IV. Other way there is high probability that you will have biased numbers and vulnerability.

There also very likely that there is flaw in this scheme or there will be in your implementation

ralu
I don't think this kind of cryptosystem is necessarily insecure.<p>Even if we take the example where there are only 4 digits in the input:<p>Gives 10^^4 possible plaintext/ciphertext values<p>Gives (10^^4)! possible arrangements of ciphertext for 0000->9999 plaintext<p>This is a very big number with only 4 digits!<p>
Agreed with unknown. Also, nothing says that encoding from plain to cipher has to stay consistent throughout the message. I'm reasonably sure it's mathematically provable that this *can* be done securely. It's some of the other constraints of the problem that I'm more worried about.
Carl Smotricz
That is true, but it is comletly insecure on known plaintext attack which is not so hard to apply http://en.wikipedia.org/wiki/Known-plaintext_attack
ralu
Don't think it is vulnerable to "known plaintext attack" as encryption key likely to be exchanged every 200 or so messages. Also plaintext is not in any way predictable from one message to the next - but almost as good as a random sequence of digits.
You just do not get the http://en.wikipedia.org/wiki/Kerckhoffs_principle
ralu
+1  A: 

I am not a cipher guru, but an obvious question comes to mind: would you be allowed to use One Time Pad encryption? Then you can just include a large block of truly random bits in your decoding system, and use the random data to transform your decimal digits in a reversible way.

If this would be acceptable, we just need to figure out how the decoder knows where in the block of randomness to look to get the key to decode any particular message. If you can send a plaintext timestamp with the ciphertext, then it's easy: convert the timestamp into a number, say the number of seconds since an epoch date, modulus that number by the length of the randomness block, and you have an offset within the block.

With a large enough block of randomness, this should be uncrackable. You could have the random bits be themselves encrypted with strong encryption, such that the user must type in a long password to unlock the decoder; in this way, even if the decryption software was captured, it would still not be easy to break the system.

If you have any interest in this and would like me to expand further, let me know. I don't want to spend a lot of time on an answer that doesn't meet your needs at all.

EDIT: Okay, with the tiny shred of encouragement ("you might be on to something") I'm expanding my answer.

The idea is that you get a block of randomness. One easy way to do this is to just pull data out of the Linux /dev/random device. Now, I'm going to assume that we have some way to find an index into this block of randomness for each message.

Index into the block of randomness and pull out ten bytes of data. Each byte is a number from 0 to 255. Add each of these numbers to the respective digit from the plaintext, modulo by 10, and you have the digits of the ciphertext. You can easily reverse this as long as you have the block of random data and the index: you get the random bits and subtract them from the cipher digits, modulo 10.

You can think of this as arranging the digits from 0 to 9 in a ring. Adding is counting clockwise around the ring, and subtracting is counting counter-clockwise. You can add or subtract any number and it will work. (My original version of this answer suggested using only 3 bits per digit. Not enough, as pointed out below by @Baffe Boyois. Thank you for this correction.)

If the plain text digit is 6, and the random number is 117, then: 6 + 117 == 123, modulo 10 == 3. 3 - 117 == -114, modulo 10 == 6.

As I said, the problem of finding the index is easy if you can use external plaintext information such as a timestamp. Even if your opponent knows you are using the timestamp to help decode messages, it does no good without the block of randomness.

The problem of finding the index is also easy if the message is always delivered; you can have an agreed-upon system of generating a series of indices, and say "This is the fourth message I have received, so I use the fourth index in the series." As a trivial example, if this is the fourth message received, you could agree to use an index value of 16 (4 for fourth message, times 4 bytes per one-time pad). But you could also use numbers from an approved pseudorandom number generator, initialized with an agreed constant value as a seed, and then you would get a somewhat unpredictable series of indexes within the block of randomness.

Depending on your needs, you could have a truly large chunk of random data (hundreds of megabytes or even more). If you use 10 bytes as a one-time pad, and you never use overlapping pads or reuse pads, then 1 megabyte of random data would yield over 100,000 one-time pads.

steveha
Yes - think you might be on to something. Though I can see one major problem. Whole point of a one-time pad is that you use it once and then throw the pad away. While not impossible, it could be cumbersome to exchange a whole new pad each time I want to send a string encrypted in this manner.
The block of randomness is essentially a warehouse of a large collection of one-time pads. You don't have to throw away the block of randomness after each message, you just need to make sure that each message uses a different part of the block of randomness. If you think this idea might work, I'll expand on it in the answer.
steveha
@steveha In this case it is not called one time pad
ralu
This could work. If I securely exchanged new a 2048-byte PAD (or block of randomness) on a daily basis, how many PAD usages would this block support - say for example, each message required 8-bytes of randomness. Would this give me enough for 256 messages. Is there any reason it would be insecure to use the PAD from the top to the bottom before discarding - given that the PAD is entirely random? Or would I be limited to using a much smaller proportion of the PAD before I had to discard.
@ralu, now you see why I disclaimed any expertise in this area. What would you call this?
steveha
@unknown, the pad is truly random. Nobody can predict any values from it. Thus, if you want to use each byte in the pad one time and never use it again, you could still use every byte in the pad before discarding it. So yes, if you send 2048 bytes for a pad, you used 8 bytes per pad, and discarded each pad after one use, that works out to 256 messages per pad.
steveha
@unknown, it is best to use true random numbers to generate the pad. The Linux kernel can read various sources of randomness, including hardware random number generator features available on some CPU chips, to produce true randomness with no predictability. There are also hardware devices available to produce streams of random data. A one-time pad is not "crackable" the way a cipher is; finding out what several messages actually meant doesn't help decrypt future messages at all. And you could send the same message every day and it would likely never produce the same ciphertext.
steveha
Note that using only 3 bits of randomness per decimal digit of input will reveal some information about the input. For example, if the ciphertext is 8, you know for sure the corresponding plaintext can't be 0 or 9.
Baffe Boyois
@steveha: Thanks for your suggestion - but if this is not "one-time-pad", I'll need to be able to state what it is before I get approval for the implementation. Ideally, I should be able to identify this scheme in Schneier's "Applied Cryptography" or some similarly respected tome. As much as I might like to create a new method - unless it's proven, I won't get approval to implement this solution.
@Baffe: Not sure where you're coming from. The block of randomness would be a block of random decimal digits.
"Each group of three bits encodes a number from 0-7. Add each of these numbers to the respective digit from the plaintext, modulo by 10". So they are only random numbers between 0 and 7, which means each digit in the ciphertext only has 8 possible corresponding plaintexts instead of 10.
Baffe Boyois
@unknown, I'm sure you have some crypto experts you can discuss this with. I'm just some guy, not really an expert in anything, and certainly not an expert in what you can get approved for government projects or whatever this is. But I do suggest you use one-time pad or some variant, because with only ten digits of message, any simple cipher will be subject to traffic analysis; you would really prefer to have the same message come out different each time, and a one-time pad gives you that.
steveha
@Baffe Boyois, thank you for putting your finger on a problem. I have updated the answer to suggest adding one whole byte to each digit, and credited you with identifying the problem.
steveha
+7  A: 

Use AES/OFB, or any other stream cipher. It will generate a keystream of pseudorandom bits. Normally, you would XOR these bits with the plaintext. Instead:

For every decimal digit in the plaintext
  Repeat
    Take 4 bits from the keystream
  Until the bits form a number less than 10
  Add this number to the plaintext digit, modulo 10

To decrypt, do the same but subtract instead in the last step.

I believe this should be as secure as using the stream cipher normally. If a sequence of numbers 0-15 is indistinguishable from random, the subsequence of only those of the numbers that are smaller than 10 should still be random. Using add/subtract instead of XOR should still produce random output if one of the inputs are random.

Baffe Boyois
Can you explain how this could work?You use AES to encrypt input, this gives hex output.You then derive a decimal string from this hex output.From what I can tell, information is lost at this stage - so you have no way to get back to original plaintext given only the decimal ciphertext and the key.
You suggested AES with OFB in the question. OFB converts a block cipher (like AES) into a stream cipher. In a stream cipher, the keystream is only dependant on the key and IV, not on the plaintext. As long as both ends have the same key and IV, they can generate the same keystream. Thus, they can also generate the same modified stream of decimal digits. The keystream is not "encrypted input" - it isn't related to the input at all.
Baffe Boyois
If I understand you correctly, this requires that a new Key and IV are exchanged prior to each message. This will prove to be a bit cumbersome.
Could you clarify what you mean? You need both sides to have the same key and IV in any usage of a symmetric cipher. I don't think this idea adds any new requirements there?
Baffe Boyois
I was hoping to exchange the Key and IV between both sides once per day (at most). There will be in the region of 200-300 messages per day. These messages will be very short and it will be cumbersome to add a layer of key exchange on top of each very short message.
Ah, there is no need for a new key and IV for every message! You can keep using the same one for as long as it is secure - I'm not sure about the details for AES, but it should be good for at least gigabytes of keystream before needing to restart - not a problem for 200-300 messages per day.
Baffe Boyois
Also note that if using One-Time Pad is a possibility, this method can be used there too - just use the random pad as your source of keystream instead of AES.
Baffe Boyois
Yes - I now see what you're talking about. I think this is it. Thanks for the solution.
You *do* need a new IV for every message, or this isn't secure. You're re-using the same key stream for every message. As Bruce Schneier says, "The first rule of an output-feedback mode stream cipher, any of them, is that you should never use the same key to encrypt two different messages. Repeat after me: NEVER USE THE SAME KEY TO ENCRYPT TWO DIFFERENT MESSAGES."
erickson
Sorry - had to mark as unanswered in light of above.@sylvarking: Thanks for pointing this out.
You only need a new IV for each message if you restart the encryption for each message. If you keep using more bytes from the *same* keystream, the same key won't be used for multiple messages until the period of the generator is reached - as above, after gigabytes at least.
Baffe Boyois
With that qualification, I agree: as long as you can keep the messages in order, that would work and be secure. It's equivalent to using CTR mode as I suggested in my answer.
erickson
True. I suspect my answer needs to be combined with at least one other to get a complete cryptosystem - it really only deals with the unique restriction of decimal input/output of same length.
Baffe Boyois
@Baffe: I'm going to award the solution to you (again). I can actually implement either option: keep the stream permanently active on both sides or use a sequence no as part of the initialisation (CTR mode)
A: 

One potential candidate is the FFX encryption mode, which has recently been submitted to NIST.

Accipitridae
This looks great - do you know of any implementations?
Nope, sorry. This proposal is new and replaces a mode called FFSEM, which was previously submitted by one of the authors. The other two authors presented a paper during this years CRYPTO conference. I can't say how long it takes before some reference implementations are published.
Accipitridae
+1  A: 

Stream ciphers require a nonce for security; the same key stream state must never be re-used for different messages. That nonce adds to the effective ciphertext length.

A block cipher used in a streaming mode has essentially the same issue: a unique initialization vector must be included with the cipher text.

Many stream ciphers are also vulnerable to ciphertext manipulation, where flipping a bit in the ciphertext undetectably flips the corresponding bit in the plaintext.

If the numbers are randomly chosen, and each number is encrypted only once, and the numbers are shorter than the block size, ECB offers good security. Under those conditions, I'd recommend AES in ECB mode as the solution that minimizes ciphertext length while providing strong privacy and integrity protection.

If there is some other information in the context of the ciphertext that could be used as an initialization vector (or nonce), then this could work. This could be something explicit, like a transaction identifier during a purchase, or something implicit like the sequence number of a message (which could be used as the counter in CTR mode). I guess that VeriShield is doing something like this.

erickson
Thanks for your help with this.
A: 

For those doubting FFX mode AES, please feel free to contact me for further details. Our implementation is a mode of AES that effectively sits on top of existing ciphers. the specification with proof/validation is up on the NIST modes website. FFSEM mode AES is included under FFX mode.

http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/ffx/ffx-spec.pdf

If its meaningful, you can also have a conversation with NIST directly about their status in respect to modes submission/AES modes acceptance to address your FIPS question. FFX has security proofs, independent cryptographic review and is not a "new cipher". It is however based on methods that go back 20+ years - proven techniques. In implementation we ability to encrypt data whilst preserving length, structure, integrity, type and format. For example specify an explicit format policy that the output will be NNN-NN-NNNN.

So, as a mode of AES we can for example on a mainframe environment for implementation we simple use the native AES processor on a z10. Similar on open systems with HSM devices- we can sit on top of an existing AES implementation.

Format Preserving Encryption (as its often referred to) in this way is already being used in industry and available in off-the-shelf products and rather quick to deploy - already used in POS devices etc, Payments systems, Enterprise deployments etc.

Mark Bower VP Product Management Voltage Security Drop a note to [email protected] for more info or take a look at our website for more info.

Mark Bower