ansaurus

Question

Answer 1

+13 A:

The general overview of my solution would be:

I start with calculating the maximum amount of raw data that you can fit into 140 utf8 characters.
- (I am assuming utf8, which is what the original website claimed twitter stored it's messages in. This differs from the problem statement above, which asks for utf16.)
- Using this utf8 faq, I calculate that the maximum number of bits you can encode in a single utf8 character is 31 bits. In order to do this, I would use all characters that are in the U-04000000 – U-7FFFFFFF range. (1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx, there are 31 x's, therefore I could encode up to 31 bits).
- 31 bits times 140 characters equals 4340 bits. Divide that by 8 to get 524.5, and round that down to 542 bytes.
- (If we restrict ourselves to utf16, then we could only store 2 bytes per character, which would equal 280 bytes).
Compress the image down using standard jpg compression.
- Resize the image to be approximately 50x50px, then attempt to compress it at various compression levels until you have an image that is as close to 542 bytes as possible without going over.
- This is an example of the mona lisa compressed down to 536 bytes.
Encode the raw bits of the compressed image into utf-8 characters.
- Replace each x in the following bytes: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx, with the bits from the image.
- This part would probably be the part where the majority of the code would need to be written, because there isn't anything that currently exists that does this.

I know that you were asking for code, but I don't really want to spend the time to actually code this up. I figured that an efficient design might at least inspire someone else to code this up.

I think the major benefit of my proposed solution is that it is reusing as much existing technology as possible. It may be fun to try to write a good compression algorithm, but there is guaranteed to be a better algorithm out there, most likely written by people who have a degree in higher math.

One other important note though is that if it is decided that utf16 is the preferred encoding, then this solution falls apart. jpegs don't really work when compressed down to 280 bytes. Although, maybe there is a better compression algorithm than jpg for this specific problem statement.

Stephen McCarthy 2009-05-21 09:03:38

I'm at work now, but I'm definitivelly implement this solution when I got home.

Paulo Santos 2009-05-21 13:59:25

From my experimentation, it appears that UTF-16 is indeed how Twitter counts characters; BMP characters count as 1, and higher plane characters count as 2. It is not documented, but that's how their JavaScript character counter counts when you type into the input box. It's also mentioned in the comments in the original thread. I haven't tried submitting via the API to see if the counter is broken; if it is, I'll update the problem for the actual constraints. You're not likely to be able to use arbitrary UTF-8 however, since many of those longer sequences you can encode are not valid Unicode.

Brian Campbell 2009-05-21 14:37:42

After testing with their API, it turns out that they do count by Unicode characters (code points), not UTF-16 code units (it's the JavaScript character counter that counts via UTF-16, since apparently that's what the JavaScript length method does). So you can get a bit more information in there; valid Unicode characters are in the range U+0000 to U+10FFFF (a bit more than 20 bits per character; 2^20 + 2^16 possible values per character). UTF-8 allows encoding of more values than are allowed in Unicode, so if you restrict yourself to Unicode, you can get about 350 bytes of space, not 542.

Brian Campbell 2009-05-21 18:28:18

That 536-byte mona lisa looks surprisingly good, given the extreme compression!

Chris 2009-05-21 22:57:36

Looks like a MonaCyclops though - some form of Uni-browe.... <joke> ;-)

Gineer 2009-05-22 13:13:35

We can currently encode 129,775 different (assigned, non-control, non-private) Unicode characters. If we restrict ourselves to that subset, it's a total of 2377 bits, or 297 bytes. Code here: http://porg.es/blog/what-can-we-fit-in-140-characters

Porges 2009-05-27 07:13:02

How many BPP is the mona lisa you did?

Chris S 2009-05-29 09:46:37

Answer 2

+11 A:

This genetic algorithm that Roger Alsing wrote has a good compression ratio, at the expense of long compression times. The resulting vector of vertices could be further compressed using a lossy or lossless algorithm.

http://rogeralsing.com/2008/12/07/genetic-programming-evolution-of-mona-lisa/

Would be an interesting program to implement, but I'll give it a miss.

CiscoIPPhone 2009-05-21 09:52:12

Answer 3

+3 A:

I'd be interested in seeing how much hidden data could be encoded in what would be a human readable message. Even if it was enough for just a url it would be useful.

Rick Minerich 2009-05-21 19:30:45

That's not an answer. Why not start your own question with that idea?

rjmunro 2009-07-27 09:59:38

Answer 4

+3 A:

Hi very intersting challenge,

I have several stupid questions as I am trying to size the challenge:

where is the original picture (the one above has two images, including your result).
how many pixels does the original picture have ? how many bytes goes into the encoding of each of the three RGB colors in the original picture ?
I have a specific solution in mind but, in your challenge is there some type of requirement that the encoder is lightweight (i.e. simple, not CPU intensive) ?

Cheers,

Igor.

2009-05-21 20:29:34

You should be able to take any arbitrary image (in a format of your choice) and compress it into 140 Unicode characters; it shouldn't matter how many pixels it has (high resolution images will need to lose more detail than low resolution). If you want to try your hand at the image from the original post, I think you can find it on Wikipedia: http://upload.wikimedia.org/wikipedia/commons/6/6a/Mona_Lisa.jpg . Feel free to try encoding other images. Take a look at the Flickr post I linked to for more detail on how the original works. There are no restrictions on how CPU intensive the process is.

Brian Campbell 2009-05-21 20:57:33

and each 140 Unicode characters are 3 or 4 bytes each ?

2009-05-22 00:02:47

A Unicode code point (which I have been referring to as a Unicode character, which isn't quite correct) is a value in the range 0x0 to 0x10FFFF. It is slightly more than 20 bits of information. Some of these are control characters, and many of these are unassigned. You can choose whether or not you will use the entire range, or only code points assigned to non-control characters. There is a list of assigned Unicode characters at http://www.unicode.org/Public/UNIDATA/UnicodeData.txt (note that some characters are listed directly, and some listed by specifying the start and end of a range).

Brian Campbell 2009-05-22 02:02:56

Another good guide on UTF-8 is http://www.cl.cam.ac.uk/~mgk25/unicode.html

a_m0d 2009-05-27 11:49:32

Answer 5

+2 A:

Brian,

So if I am reading this challenge correctly, we are looking to see how large of an image we can get by using only 350 bytes of space. Thanks.

Igor.

Igor Carron 2009-05-22 10:50:29

Essentially, yes. Though I will generally favor solutions that try to work with even tighter restrictions, like only assigned Unicode characters or only characters that can be copied and pasted in Twitter. See the rough scoring rubric I posted for details.

Brian Campbell 2009-05-22 20:52:32

Answer 6

+7 A:

In the original challenge the size limit is defined as what Twitter still allows you to send if you paste your text in their textbox and press "update". As some people correctly noticed this is different from what you could send as a SMS text message from your mobile.

What is not explictily mentioned (but what my personal rule was) is that you should be able to select the tweeted message in your browser, copy it to the clipboard and paste it into a text input field of your decoder so it can display it. Of course you are also free to save the message as a text file and read it back in or write a tool which accesses the Twitter API and filters out any message that looks like an image code (special markers anyone? wink wink). But the rule is that the message has to have gone through Twitter before you are allowed to decode it.

Good luck with the 350 bytes - I doubt that you will be able to make use of them.

Quasimondo 2009-05-22 11:41:20

Yes, I've added a scoring rubric that indicates that tighter restrictions on the character set are preferred, but not required. I would like to have a rule that requires that messages pass through Twitter unscathed, but that would take a lot of trial and error to figure out the precise details of what works, and I wanted to leave some leeway to allow for creative uses of the code space. So, the only requirement in my challenge is 140 valid Unicode characters. By the way, thanks for stopping by! I really like your solution, and want to see if any of the kibitzers can actually improve on it.

Brian Campbell 2009-05-22 21:01:56

Answer 7

+2 A:

Who is willing to post sample code to read and write image files - I'm sure there are people with ideas (like me) but getting the data into a usable structure is a barrier to entry (mainly due to the time frame).

rjstelling 2009-05-22 12:38:27

If I have time, I'll whip up a sample solution tonight. Note that you are allowed to use existing libraries and programs for manipulating images; if you want to do your image processing with ImageMagick, that's perfectly fine. I also specified that you can use any reasonable format you want, to make it easier; PPM should be a breeze to parse, there are several example implementations on the Wikipedia page: http://en.wikipedia.org/wiki/Portable_pixmap

Brian Campbell 2009-05-22 20:48:02

Answer 8

+8 A:

Posting a Monochrome or Greyscale image should improve the size of the image that can be encoded into that space since you don't care about colour.

Possibly augmenting the challenge to upload three images which when recombined give you a full colour image while still maintaining a monochrome version in each separate image.

Add some compression to the above and It could start looking viable...

Nice!!! Now you guys have piqued my interest. No work will be done for the rest of the day...

Gineer 2009-05-22 13:21:57

s/peaked/piqued/g

eleven81 2009-05-22 20:01:02

I like the idea of three images, it should be possible to implement such an idea to twitter and the result would be pretty good.

Makis 2009-05-29 10:00:57

Answer 9

+174 A:

Sam Hocevar 2009-05-24 23:41:06

I don't actually need to be able to run the code (I put the part about running it in the guidelines, as a suggestion, not the rules); I'd prefer to be able to run it, but I'll be judging this more on the quality of the images you generate, the code, and any interesting tricks or algorithms. If I want to run it and it requires packages I don't have or don't want to install on my main system, I can just boot up an Amazon EC2 instance and install it. As long as you're working with libraries that are packaged for one of the major distros, I should be able to run it. Feel free to use CGAL.

Brian Campbell 2009-05-25 03:45:36

Okay, here's my solution (source code): http://caca.zoy.org/browser/libpipi/trunk/examples/img2twit.cppMy explanation attempt and a few examples are at http://caca.zoy.org/wiki/img2twit

Sam Hocevar 2009-05-25 14:17:59

Great! That's the first full solution. Do you suppose you could edit your answer (the one where you asked your first question) to include some or all of the explanation and one or two of the example images? It's a lot nicer to have it inline here than have it linked to from a comment.

Brian Campbell 2009-05-25 21:44:38

Sure, will do that.

Sam Hocevar 2009-05-25 23:03:15

I really like your solution. You should try reducing the number of values assigned to the blue channel as the human eye can't resolve blue very well: http://nfggames.com/games/ntsc/visual.shtm; this will allow you to have more detail at the expense of some color information being lost. Or perhaps assign it to green?

rpetrich 2009-05-26 00:54:25

Good point. I did try a few variations of this idea (see the comments before the RANGE_X definition) but not very thoroughly. As you can see, using 5 blue values instead of 6 increased the error slightly less than using 7 values of green decreased it. I didn't try doing both out of laziness.Another problem I have is that I don't have a very good error function. I currently use ∑(∆r²+∆g²+∆b²)/3, which works OK. I tried ∑(0.299∆r²+0.587∆g²+0.114∆b²), based (with no physical justification) on YUV's Y component, but it was too tolerant with blue errors. I'll try to find papers about this issue.

Sam Hocevar 2009-05-26 09:14:33

very impressive results IMHO

U62 2009-05-26 12:37:27

Actually, that's jamoes's 536 byte jpeg; I just edited his answer to improve the formatting.

Brian Campbell 2009-05-26 13:03:09

This looks really nice, but I'm very disappointed. img2twit? Really? REALLY? That's all you could come up with? What happened to your literary, poetic genius? I was expecting something to perpetuate the great lineage of libcaca, libcucul, libpipi and toilet. (How about 'pcul,' picture compression utility for line-blogging?)

niXar 2009-05-26 14:26:33

@Brian: oops, I'll reattribute accordingly.@niXar: you're right. I have a more appropriate name ready, but this is a family website.

Sam Hocevar 2009-05-26 15:22:26

What application can I use to view your .ogm movies?

Andrew 2009-05-26 20:39:05

@Andrew: VLC should work, and so should MPlayer. If you are on Windows and wish to use your usual player, I believe FFdshow may help (it's a DirectShow codec wrapping a lot of opensource codecs).

Sam Hocevar 2009-05-26 21:04:09

@rpetrich: I modified the program to make it increase r/g/b ranges dynamically as long as there are enough bits available. This makes sure that we never waste more than 13 bits in the whole bitstream (but in practice it's usually 1 or 2). And the images look slightly better.

Sam Hocevar 2009-05-27 08:39:13

@Sam do you have before and after images for that last change?

Brian Campbell 2009-05-27 21:36:58

@Brian: I'm afraid I seem to have broken something else in the process when putting all the pieces back together. I will fix it tonight after work and post new images (don't hold your breath though, the improvement is not groundbreaking).

Sam Hocevar 2009-05-28 10:06:10

Sounds like your bitpacking method is a variant of arithmetic coding, without the probabilistic part. You could probably gain quality by actually using arithmetic coding (or at least reduce output size)

derobert 2009-05-29 04:11:55

@derobert: trouble is, if arithmetic coding saves me bits, I will not know how many bits until after the compression is done, and I won't be able to use these bits unless I do another compression run, which might very well not save any bits...

Sam Hocevar 2009-05-29 08:55:05

Answer 10

+5 A:

The idea of storing a bunch of reference images is interesting. Would it be so wrong to store say 25Mb of sample images, and have the encoder try and compose an image using bits of those? With such a minuscule pipe, the machinery at either end is by necessity going to be much greater than the volume of data passing through, so what's the difference between 25Mb of code, and 1Mb of code and 24Mb of image data?

(note the original guidelines ruled out restricting the input to images already in the library - I'm not suggesting that).

2009-05-27 01:50:54

That would be fine, as long as you have a fixed, finite amount of data at either endpoint. Of course, you would need to demonstrate that it works with images that are not in the training set, just like any statistical natural language process problem. I'd love to see something that takes a statistical approach to image encoding.

Brian Campbell 2009-05-27 01:53:58

I, for one, would love to see Mona Lisa redone using only Boba Fett fan art as source.

Nosredna 2009-05-27 02:20:00

Andrew 2009-05-27 03:33:25

Answer 11

A:

Is it acceptable to use something like PIL (python imaging library) for all the imaging functions, and then a custom function or two to actually encode the image into unicode?

EDIT: just read the above post, already been answered

2009-05-28 01:36:19

As you answered yourself: yep, use any and all tools at your disposal.

Brian Campbell 2009-05-28 03:09:34

Answer 12

+188 A:

SpliFF 2009-05-28 09:31:35

Excellent! At first I wanted to create a hybrid vector solution with both sharp edges and smooth areas, but it proved far too complex without using a tracing library (which I didn't want to use). I'm looking forward to seeing how far you can get with your method!

Sam Hocevar 2009-05-28 10:16:40

Nice! I was hoping we'd see some attempts at near-lossless approaches by vectorization. It means it has lower generality, but higher quality for the images it does cover. It's fine to use an online service for vectorization. Good luck on getting the size down further!

Brian Campbell 2009-05-28 13:00:29

I would consider image compression and character encoding as two different steps - Sam's technique seems to be optimal for the encoding, and could easily be built into a stand-alone program. You'll get more bang for your buck by concentrating on the unique part of your solution (i.e. the compression part) and just outputting a string of bits.

Mark Ransom 2009-05-28 21:12:42

I'm enjoying watching this evolve. I like your version 2. Is there any hope of this working on any of the other sample graphics?

Brian Campbell 2009-05-28 23:26:40

What about the Mona Lisa? You have different versions on your webspace. Could you please explain them?

furtelwart 2009-05-29 07:51:10

Nice! I rather like that simplified Mona Lisa from autotrace.

Brian Campbell 2009-05-30 19:31:36

Wow. These images look really stylish.

Rinat Abdullin 2009-06-05 04:24:22

Definitely my favorite.

Spencer Ruport 2009-06-27 04:02:46

I wonder if SO could use one of those images for the logo next April 1st.

BCS 2009-07-07 05:15:54

The results are artwork by themselves.

ItzWarty 2010-04-14 07:26:21

yae this seems to be the only one that can produce cooler images than the input =)

Claudiu 2010-06-09 05:08:43

Answer 13

A:

My first thought was along the lines of fractal compression, but the maths behind it are quite complex and often patented (grrr...) and I didn't have the time to reasearch the subject and try an implementation in the time limit for this challenge. From what I've seen over the years, the results are quite good, the decompressed image can be upscaled without becoming blocky. Here's Wikipedia's article an fractal compression for anyone interested - there's quite a few links at the bottom of the page for further reading.

Anyway, here's my rule breaking entry. It's a zipped exe command line tool written in C# using .Net 2.0 - tested on WinXP. Here's the command to 'decode' one of the sample images above:

TwitterCodec "4,-5A*>8=D/1/DB=F:.834=3B-A54B<95<3453;2G@210"

Type the command with no arguments for command line format.

Skizz

EDIT: OK, maybe that was breaking the rules a lot. Still, it allows you to 'encode' a file into a small string and 'decode' it anywhere else. It just doesn't do any compression.

EDIT 2: The code is just an FTP client that copies the image and image name to a server and encodes the URL as output for retrieval at a later date.

Skizz 2009-05-28 20:26:50

Not just bending the rules; specifically breaking them. "The decoding process may have no access to any other output of the encoding process other than the output specified above; that is, you can't upload the image somewhere and output the URL for the decoding process to download, or anything silly like that." Plus you didn't attach any source code or explanation, and it doesn't even work (at least under Mono); here's what I get from sniffing the packets: "RETR 478b606a-a151-47e5-a7a2-451b39cb7272550 Permission denied."

Brian Campbell 2009-05-28 21:07:54

Fair enough. I didn't read the whole thing in detail. The source code would have given it away somewhat, it's just a modified version of the FTP Client sample from the MSDN. Having never used Mono, I'm not sure why it's failing.

Skizz 2009-05-28 22:19:57

I remember buying the hardcover book that explained fractal compression. Most frustrating book ever, as everything was vague due to all the patents surrounding the technology.

Nosredna 2009-05-29 21:23:46

Answer 14

+27 A:

2009-05-29 05:46:37

Brian Campbell 2009-05-29 06:48:22

Encoding a DLI image to Unicode would definitely give the best results. Could you also show the results for 251 bytes of data? That's how many information bytes there are in 140 CJK characters.

Sam Hocevar 2009-05-29 08:26:27

By the way, the DLI author mentions a "long processing time". As I am unable to run his software, could you give us rough compression time numbers?

Sam Hocevar 2009-05-29 08:57:12

Using an AMD Athlon64 2.4Ghz, compression of the 100x150 Mona Lisa image takes 38sec and decompression 6sec.Compressing to a maximum of 251 bytes is tougher, the output quality is significantly reduced. Using the reference Mona Lisa image, I scaled it down to 60x91 then used DLI to compress it to 243 bytes (closest to 251 without going over). This is the output i43.tinypic.com/2196m4g.pngThe detail isn't near the 534 byte DLI even though bitrate has only been reduced by ~50%. The structure of the image has been maintained fairly well however.

2009-05-29 20:54:55

Decided to make it easier to compare the 250 byte compressed samples. The 243 byte DLI was scaled up and placed beside the IMG2TWIT sample. IMG2TWIT on the left, DLI on the right. Here's the image i40.tinypic.com/30ndks6.png

2009-05-29 21:25:07

That's very impressive. I wasn't aware of DLI, let's hope the guy releases some information about what it does. One last question: does DLI allow you to specify a target size, or do you have to try and guess if you want a given number of bytes?

Sam Hocevar 2009-05-29 21:33:22

DLI uses a quality parameter like JPEG, so trial-and-error is needed if a target output size is desired.

2009-05-29 23:03:06

@Dennis Do you have any source code available, or any writeup on the techniques used? I'm very impressed by the level of detail you get here, and I'd love to have some more information on how it works.

Brian Campbell 2009-05-30 16:50:56

@Sam If you want to run dli, I've found that it works just fine under Wine.

Brian Campbell 2009-05-30 23:24:40

Sorry, source code and a description of DLI's technology is currently not available. On another note, I was surprised you chose the fractal solution over IMG2TWIT. Personally, I prefer IMG2TWIT's solution and output quality so I'm looking forward to the details on your evaluation.

2009-05-31 01:21:08

Answer 15

+164 A:

Alright, here's mine: nanocrunch.cpp and the CMakeLists.txt file to build it using CMake. It relies on the Magick++ ImageMagick API for most of its image handling. It also requires the GMP library for bignum arithmetic for its string encoding.

I based my solution off of fractal image compression, with a few unique twists. The basic idea is to take the image, scale down a copy to 50% and look for pieces in various orientations that look similar to non-overlapping blocks in the original image. It takes a very brute force approach to this search, but that just makes it easier to introduce my modifications.

The first modification is that instead of just looking at ninety degree rotations and flips, my program also considers 45 degree orientations. It's one more bit per block, but it helps the image quality immensely.

The other thing is that storing a contrast/brightness adjustment for each of color component of each block is way too expensive. Instead, I store a heavily quantized color (the palette has only 4 * 4 * 4 = 64 colors) that simply gets blended in in some proportion. Mathematically, this is equivalent to a variable brightness and constant contrast adjustment for each color. Unfortunately, it also means there's no negative contrast to flip the colors.

Once it's computed the position, orientation and color for each block, it encodes this into a UTF-8 string. First, it generates a very large bignum to represent the data in the block table and the image size. The approach to this is similar to Sam Hocevar's solution -- kind of a large number with a radix that varies by position.

Then it converts that into a base of whatever the size of the character set available is. By default, it makes full use of the assigned unicode character set, minus the less than, greater than, ampersand, control, combining, and surrogate and private characters. It's not pretty but it works. You can also comment out the default table and select printable 7-bit ASCII (again excluding <, >, and & characters) or CJK Unified Ideographs instead. The table of which character codes are available is stored a run-length encoded with alternating runs of invalid and valid characters.

Anyway, here are some images and times (as measured on my old 3.0GHz P4), and compressed to 140 characters in the full assigned unicode set described above. Overall, I'm fairly pleased with how they all turned out. If I had more time to work on this, I'd probably try to reduce the blockiness of the decompressed images. Still, I think the results are pretty good for the extreme compression ratio. The decompressed images are bit impressionistic, but I find it relatively easy to see how bits correspond to the original.

Stack Overflow Logo (8.6s to encode, 7.9s to decode, 485 bytes):

Lena (32.8s to encode, 13.0s to decode, 477 bytes):

Mona Lisa (43.2s to encode, 14.5s to decode, 490 bytes):

Edit: CJK Unified Characters

Sam asked in the comments about using this with CJK. Here's a version of the Mona Lisa compressed to 139 characters from the CJK Unified character set:

咏璘驞凄脒鵚据蛥鸂拗朐朖辿韩瀦魷歪痫栘璯緍脲蕜抱揎頻蓼債鑡嗞靊寞柮嚛嚵籥聚隤慛絖銓馿渫櫰矍昀鰛掾撄粂敽牙稉擎蔍螎葙峬覧絀蹔抆惫冧笻哜搀澐芯譶辍澮垝黟偞媄童竽梀韠镰猳閺狌而羶喙伆杇婣唆鐤諽鷍鴞駫搶毤埙誖萜愿旖鞰萗勹鈱哳垬濅鬒秀瞛洆认気狋異闥籴珵仾氙熜謋繴茴晋髭杍嚖熥勳縿餅珝爸擸萿

The tuning parameters at the top of the program that I used for this were: 19, 19, 4, 4, 3, 10, 11, 1000, 1000. I also commented out the first definition of number_assigned and codes, and uncommented out the last definitions of them to select the CJK Unified character set.

Boojum 2009-05-30 08:41:36

Wow! Nice job. I was skeptical of fractal image compression for images this small, but it actually does produce pretty decent results. It was also pretty easy to compile and run.

Brian Campbell 2009-05-30 15:33:05

+1 -- This is brilliant. Do you have a link to CJK results, too? It seems to require special source code tuning.

Sam Hocevar 2009-05-30 16:09:09

Thanks guys! Sam, do you mean results with just 140 CJK characters? If so, then yes, you'll need to tune the numbers at the top. The final size in bits is around log2(steps_in_x*steps_in_y*steps_in_red*steps_in_green*steps_in_blue)*blocks_in_x*blocks_in_y+log2(maximum_width*maximum_height).

Boojum 2009-05-30 18:04:12

Edit: There's a * 16 in the first log2() that I left out. That's for the possible orientations.

Boojum 2009-05-30 18:13:35

Have anyone twitter'd an image using this yet?

dbr 2009-05-31 16:16:54

I come back to this entry once in a while just to see how awesome it is.

Beska 2009-12-17 21:39:31

Answer 16

+11 A:

Rob 2009-05-30 14:02:25

Great, thanks for the entry! Grayscale actually works fairly well for most of these, though Lena is a bit hard to make out. I was looking for your source but got a 404; could you make sure it's up there?

Brian Campbell 2009-05-30 18:22:25

Double check it now, I was updating the site so you might have caught me between updates.

Rob 2009-05-30 18:35:33

Yep, I can download it now. Now of course I need to figure out if I can get Mono to compile it.

Brian Campbell 2009-05-30 19:46:03

Yep! Works under Mono, I compiled with "gmcs -r System.Drawing TwitterImage.cs Program.cs" and run with "mono TwitterImage.exe encode lena.png lena.txt"

Brian Campbell 2009-05-30 20:17:15

Cool! I did double check to make sure the libraries I were using were listed for Mono, but I haven't actually worked with Mono yet so I wasn't sure if it would or not.

Rob 2009-05-30 21:23:23

Sample images are not visible

Jakub Narębski 2009-06-18 07:13:16

Answer 17

A:

Is it possible to use text compression on the resulting unicode string?

2009-05-30 20:46:43

That's extremely unlikely to help: the resulting Unicode string is likely to have very low compressibility ("high entropy").

ShreevatsaR 2009-05-31 16:15:05

Answer 18

+3 A:

Stupid idea, but sha1(my_image) would result in a "perfect" representation of any image (ignoring collisions). The obvious problem is the decoding process requires inordinate amounts of brute-forcing..

1-bit monochrome would be a bit easier.. Each pixel becomes a 1 or 0, so you would have 1000 bits of data for a 100*100 pixel image. Since the SHA1 hash is 41 characters, we can fit three into one message, only have to brute force 2 sets of 3333 bits and one set of 3334 (although even that is probably still inordinate)

It's not exactly practical. Even with the fixed-length 1-bit 100*100px image there is.., assuming I'm not miscalculating, 49995000 combinations, or 16661667 when split into three.

def fact(maxu):
        ttl=1
        for i in range(1,maxu+1):
                ttl=ttl*i
        return ttl

def combi(setsize, length):
    return fact(length) / (fact(setsize)*fact(length-setsize))

print (combi(2, 3333)*2) + combi(2, 3334)
# 16661667L
print combi(2, 10000)
# 49995000L

dbr 2009-05-30 20:47:15

The issue with sha1(my_image) is that if you spent your time brute forcing it, you'd probably find man many collisions before you found the real image; and of course brute forcing sha1 is pretty much computationally infeasible.

Brian Campbell 2009-05-30 23:26:31

Even better than SHA1 compression: my "flickr" compression algorithm! Step 1: upload image to flickr. Step 2: post a link to it on twitter. Tadda! Only 15 bytes uses!

niXar 2009-06-19 16:29:36

niXar: Nope, rule 3.4: "The decoding process may have no access to any other output of the encoding process other than the output specified above; that is, you can't upload the image somewhere and output the URL for the decoding process to download, or anything silly like that."

dbr 2009-06-19 22:41:17

I know, I was being sarcastic.

niXar 2009-06-26 16:21:37

Answer 19

+4 A:

Here this compression is good.

www.intuac.com/userport/john/apt/

http://img86.imageshack.us/img86/4169/imagey.jpg

I used the following batch file:

capt mona-lisa-large.pnm out.cc 20
dapt out.cc image.pnm
Pause

The resulting filesize is 559 bytes.

2009-07-19 18:27:59

Answer 20

+4 A:

Regarding the encoding/decoding part of this challenge. base16b.org is my attempt to specify a standard method for safely and efficiently encoding binary data in the higher Unicode planes.

Some features :

Uses only Unicode's Private User Areas
Encodes up to 17 bits per character; nearly three times more efficient than Base64
A reference Javascript implementation of encode/decode is provided
Some sample encodings are included, including Twitter and Wordpress

Sorry, this answer comes way too late for the original competition. I started the project independently of this post, which I discovered half-way into it.

2009-08-07 01:39:09

Answer 21

+6 A:

Okay, I'm late to the game, but nevertheless I made my project.

It's a toy genetic algorithm that uses translucent colorful circles to recreate the initial image.

Features:

pure Lua. Runs anywhere where a Lua interpreter runs.
uses netpbm P3 format
comes with a comprehensive suite of unit tests
preserves original image size

Mis-feautres:

slow
at this space constraints it preserves only the basic color scheme of the initial image and a general outline of few features thereof.

Here's an example twit that represents Lena: 犭楊谷杌蒝螦界匘玏扝匮俄归晃客猘摈硰划刀萕码摃斢嘁蜁嚎耂澹簜僨砠偑婊內團揕忈義倨襠凁梡岂掂戇耔攋斘眐奡萛狂昸箆亲嬎廙栃兡塅受橯恰应戞优猫僘瑩吱賾卣朸杈腠綍蝘猕屐稱悡詬來噩压罍尕熚帤厥虤嫐虲兙罨縨炘排叁抠堃從弅慌螎熰標宑簫柢橙拃丨蜊缩昔儻舭勵癳冂囤璟彔榕兠摈侑蒖孂埮槃姠璐哠眛嫡琠枀訜苄暬厇廩焛瀻严啘刱垫仔

original lena encoded Lena

The code is in a Mercurial repository at bitbucket.org. Check out http://bitbucket.org/tkadlubo/circles.lua

Tadeusz A. Kadłubowski 2010-08-22 11:17:40

Awesome! Creates neat, artistic looking images. I'm glad people are still working on this; it's been loads of fun to see all of the different approaches.

Brian Campbell 2010-08-22 19:21:52

ansaurus

tags:

views:

answers:

Twitter image encoding challenge

Rules

Guidelines

Scoring rubric

Reference images

Prize

Note on deadline

Unicode notes

Tips & Links

Edit log

related questions