views:

451

answers:

10

I wish to put some text on a page and hide some data in that text. Does anybody know of any methods / patterns that have been used in the past to solve this problem?

Example: I have the following text: "The cat sat on the dog and was happy."

I also have the number 123. I want to hide this number in that sentence such that the sentence can be placed on a web page and only someone in the know would be able to find the data.

A: 

Well, you could try something like this...not sure if that's exactly what you're looking for, though.

Gunny
A: 

There may be an algorithm that can turn that sentence into 123, but I think in general you're going to need to accept some modifications to the text if you need to store any possible numerical value!

dicroce
+2  A: 

I think at a high level what you are talking about is steganography. http://en.wikipedia.org/wiki/Steganography

The section on modern techniques should get you started: http://en.wikipedia.org/wiki/Steganography#Modern_steganographic_techniques

torial
+1  A: 

I think what you're looking for is something called Steganography. Corinna John has an excellent collection of articles on the subject up on CodeProject.

http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=475133

Jeremy Wiebe
To add.. if you follow the links at CodeProject, you'll get to her homepage.. which seems focused on Do-It Yourself Steganography...http://www.binary-universe.net/
torial
+7  A: 

Of course this can be done.

What you are describing is in a broad description called Steganography.

For instance, you might encode a number in such a way that you count the number of words until you see the letter B, in which case 123 could be encoded as:

You belong to the beautiful group of people being elite.

The thing is, the person wanting to decode your message must know your algorithm.

Edit I notice that my numbers are off by one. Start counting at 0 and you'll see the number 123.

Lasse V. Karlsen
A: 

If the 'text' was actually an image, then you could hide data in that using steganography - the data is hidden in the binary image file without affecting the way the image looks.

Brabster
Hiding data in images is just one branch of steganography.
Bill the Lizard
A: 

According to this thread:

Prof. Mikhail Atallah et. al. here at Purdue did a lot of research on watermarking text.

The approach uses TMRs (Text Meaning Representation) of phrases to encode bits by performing minor transformations positioning the TMR at a certain distance from a defined canonical form.

(another method to watermark text is presented here)

It may be another way to hide text within text, along with the Steganograph method described in the other answers.

VonC
+3  A: 

HTML makes it quite easy to do this, actually. No need for really cunning amounts of steganography, etc. Let's see:

This sentence embeds 123 and then stops embedding.

This sentence embeds 0102 and then stops embedding.

(We'll have to see whether it actually works in markdown, but I suspect so.) Admittedly it's pretty obvious if you know that there's something to look for, but I think you'll agree it's not obvious to casual observers.

I've left it as a little puzzle to work out the scheme, but add a comment if you want it to be explicitly explained.

Jon Skeet
Be sure to enable compression on your HTTP server if you do this!
Ryan Fox
Yes, if you're transmitting significant amounts of data it could get somewhat unwieldy.
Jon Skeet
+1  A: 

There are very complicated approaches to this problem, however you can probably go with a very simple one. E.g. define an adjective for every number:

0. beautiful
1. harmless
2. evil
3. colorful
4. weird

and so on. Now select sentences of your choice and put place holders into the sentences where adjectives belong.

"The {adj} cat sat on the {adj} dog and the {adj} cat was happy."

Your number is 123, so your sentence is

"The harmless cat sat on the evil dog and the colorful cat was happy."

A parser can easily take the sentence, split it up into words, find adjectives on the table above, and convert them back to numbers.

The -> ?
harmless -> 1
cat -> ?
sat -> ?
on -> ?
the -> ?
evil -> 2
:

at the end you have 123 again.

As soon people know that there is information hidden in the sentence, the algorithm is easily broken. You can make it harder to break if you add variation by defining multiple adjectives per number. Instead of

1. harmless

you can define

1. harmless/stupid/blue/fashionable

when you need to encode 1, randomly pick any of the words above. As these all map to the number 1, the reverse parser won't care which of the words is printed there, the result will always be one. This randomization will make it harder to reverse engineer the algorithm.

Mecki
A: 

The approach Jon Skeet mentioned is very similar to Matthew Kwan's "SNOW" approach. Both of them hide small amounts of arbitrary information in text without adding, deleting, or changing any of the words in the source text.

David Cary