views:

30

answers:

7

Hi,

i need to create a string to store couples of key/value data, for example:

key1::value1||key2::value2||key3::value3

in deserializing it, i may encounter an error if the key or the value happen to contain || or ::

What are common techniques to deal with such situation? thanks

+2  A: 

A common way to deal with this is called an escape character or qualifier. Consider this Comma-Separated line:

Name,City,State
John Doe, Jr.,Anytown,CA

Because the name field contains a comma, it of course gets split improperly and so on.

If you enclose each data value by qualifiers, the parser knows when to ignore the delimiter, as in this example:

Name,City,State
"John Doe, Jr.",Anytown,CA

Qualifiers can be optional, used only on data fields that need it. Many implementations will use qualifiers on every field, needed or not.

You may want to implement something similar for your data encoding.

JYelton
A: 

The common technique is escaping reserved characters, for example:

  • In urls you escape some characters using %HEX representation: http://xxx.com?aa=a%20b

  • In programming languages you escape some characters with a slash prefix: "\"hello\""

gustavogb
A: 

A simple solution is to escape a separator (with a backslash, for instance) any time it occurs in data:

Name,City,State
John Doe\, Jr.,Anytown,CA

Of course, the separator will need to be escaped when it occurs in data as well; in this case, a backslash would become \\.

titaniumdecoy
A: 

Use a prefix (say "a") for your special characters (say "b") present in the key and values to store them. This is called escaping.

Then decode the key and values by simply replacing any "ab" sequence with "b". Bear in mind that the prefix is also a special character. An example:

Prefix: \

Special characters: :, |, \

Encoded:

title:Slashdot\: News for Nerds. Stuff that Matters.|shortTitle:\\.

Decoded:

title=Slashdot: News for Nerds. Stuff that Matters.

shortTitle=\.

hgpc
+1  A: 

Escape || when serializing, and unescape it when deserializing. A common C-like way to escape is to prepend \. For example:

{ "a:b:c": "foo||bar", "asdf": "\\|||x||||:" }
serialize => "a\:b\:c:foo\|\|bar||asdf:\\\\\|\|\|x\|\|\|\|\:"

Note that \ needs to be escaped (and double escaped due to being placed in a C-style string).

strager
+1  A: 

If we assume that you have total control over the input string, then the common way of dealing with this problem is to use an escape character.

Typically, the backslash-\ character is used as an escape to say that "the next character is a special character", so in this case it should not be used as a delimiter. So the parser would see || and :: as delimiters, but would see \|\| as two pipe characters || in either the key or the value.

The next problem is that we have overloaded the backslash. The problem is then, "how do I represent a backslash". This is sovled by saying that the backslash is also escaped, so to represent a \, you would have to say \\. So the parser would see \\ as \.

Note that if you use escape characters, you can use a single character for the delimiters, which might make things simpler.

Alternatively, you may have to restict the input and say that || and :: are just baned and fail/remove when the string is encoded.

monkeysplayingpingpong
A: 

You can use non-ascii character as separator (e.g. vertical tab :-) ).

You can escape separator character in your data during serialization. For example: if you use one character as separator (key1:value1|key2:value2|...) and your data is:

this:is:key1   this|is|data1
this:is:key2   this|is|data2

you double every colon and pipe character in you data when you serialize it. So you will get:

this::is::key1:this||is||data1|this::is::key2:this||is||data2|...

During deserialization whenever you come across two colon or two pipe characters you know that this is not your separator but part of your data and that you have to change it to one character. On the other hand, every single colon or pipe character is you separator.

danadam