Rules
- Your program must have two modes: encoding and decoding.
When encoding:
- Your program must take as input some human readable
Latin1
text, presumably English. - It doesn't matter if you ignore punctuation marks.
- You only need to worry about actual English words, not L337.
- Any accented letters may be converted to simple ASCII.
- You may choose how you want to deal with numbers.
- 123
- one two three
- one hundred twenty three
- 123
- 1 2 3
- one hundred twenty three
- one two three
- one hundred twenty three
- 123
- 1 2 3
- 123
Your program must output a message which can be represented in
140 code points in the range
U+0000
–U+10FFFF
Excluding non-characters:
U+FFFE
U+FFFF
U+
n
FFFE
,U+
n
FFFF
wheren
is1
–10
hexadecimalU+FDD0
–U+FDEF
U+D800
–U+DFFF
(surrogate code points).
It may be output in any reasonable encoding of your choice; any encoding supported by GNU
iconv
will be considered reasonable, and your platform native encoding or locale encoding would likely be a good choice.- Your program must take as input some human readable
When decoding:
- Your program should take as input the output of your encoding mode.
- The text output should be an approximation of the input text.
- The closer you can get to the original text, the better.
- Doesn't need to have any punctuation.
The output text should be readable by a human, again presumably English.
- Can be L337, or lol.
- The decoding process may have no access to any other output of the encoding process other than the output specified above; that is, you can't upload the text somewhere and output the URL for the decoding process to download, or anything silly like that.
- For the sake of consistency in user interface, your program must behave as follows:
- Your program must be a script that can be set to executable on a platform with the appropriate interpreter, or a program that can be compiled into an executable.
- Your program must take as its first argument either
encode
ordecode
to set the mode. - Your program must take input in at least one of the following ways:
- Take input from standard in and produce output on standard out.
my-program encode <input.txt >output.utf
my-program decode <output.utf >output.txt
- Take input from a file named in the second argument, and produce output in the file named in the third.
my-program encode input.txt output.utf
my-program decode output.utf output.txt
- Take input from standard in and produce output on standard out.
- For your solution, please post:
- Your code, in full, and/or a link to it hosted elsewhere (if it's very long, or requires many files to compile, or something).
- An explanation of how it works, if it's not immediately obvious from the code or if the code is long and people will be interested in a summary.
- An example text, with the original text, the text it compresses down to, and the decoded text.
- If you are building on an idea that someone else had, please attribute them. It's OK to try to do a refinement of someone else's idea, but you must attribute them.
The rules are a variation on the rules for Twitter image encoding challenge.