views:

456

answers:

11

I think I've come across this requirement for a dozen times. But I could never find a satisfying solution. For instance, there are a collection of string which I want to serialize (to disk or through network) through a channel where only plain string is allowed. I almost always end up using "split" and "join" with ridiculous separator like

":::==--==:::".

like this:

public static string encode(System.Collections.Generic.List<string> data)
{
    return string.Join(" :::==--==::: ", data.ToArray());
}
public static string[] decode(string encoded)
{
    return encoded.Split(new string[] { " :::==--==::: " }, StringSplitOptions.None);
}

But this simple solution apparently has some flaws. The string cannot contains the separator string. And consequently, the encoded string can no longer re-encoded again.

AFAIK, the comprehensive solution should involve escaping the separator on encoding and unescaping on decoding. While the problem sound simple, I believe the complete solution can take significant amount of code. I wonder if there is any trick allowed me to build encoder & decoder in very few lines of code ?

+5  A: 

You could use the .ToArray property on the List<> and then serialize the Array - that could then be dumped to disk or network, and reconstituted with a deserialization on the other end.

Not too much code, and you get to use the serialization techniques already tested and coded in the .net framework.

Mike
Unfortunately, the channel can only allowed "string". And I always want the output to be "readable".
Sake
.Net serialization will go to strings.
Joel Coehoorn
+4  A: 

You might like to look at the way CSV files are formatted.

  • escape all instances of a deliminater, e.g. " in the string
  • wrap each item in the list in "item"
  • join using a simple seperator like ,

I don't believe there is a silver bullet solution to this problem.

Adam Pope
A: 

Why not use Xstream to serialise it, rather than reinventing your own serialisation format?

Its pretty simple:

new XStream().toXML(yourobject)
time4tea
sorry c# question, java answer...
time4tea
That's fine. The question is language agnostic. I'd personally prefer python answer. My sample is in C# to attract more audiences.
Sake
+2  A: 

You could use an XmlDocument to handle the serialization. That will handle the encoding for you.

public static string encode(System.Collections.Generic.List<string> data)
{
    var xml = new XmlDocument();
    xml.AppendChild(xml.CreateElement("data"));
    foreach (var item in data)
    {
     var xmlItem = (XmlElement)xml.DocumentElement.AppendChild(xml.CreateElement("item"));
     xmlItem.InnerText = item;
    }
    return xml.OuterXml;
}

public static string[] decode(string encoded)
{
    var items = new System.Collections.Generic.List<string>();
    var xml = new XmlDocument();
    xml.LoadXml(encoded);
    foreach (XmlElement xmlItem in xml.SelectNodes("/data/item"))
     items.Add(xmlItem.InnerText);
    return items.ToArray();
}
David
I sometimes do exactly what you suggest. But it's overkill in most situations.
Sake
+3  A: 

Here's an old-school technique that might be suitable -

Serialise by storing the width of each string[] as a fixed-width prefix in each line.

So

 string[0]="abc"
 string[1]="defg"
 string[2]=" :::==--==::: "

becomes

0003abc0004defg0014 :::==--==:::

...where the size of the prefix is large enough to cater for the string maximum length

Ed Guiness
Why I couldn't come out with this solution before ! Nice solution !
Sake
I like this one, you could modify it slightly to be the size plus a separator so you can handle any size prefix, for example: 3|abc4|defg6|--==--
Jacob Stanley
You could add another separator after the string as well even though it's not required for parsing, just to make it easier to read. Something like 3`abc`4`defg`6`--==-- or 3:abc:4:defg:6:--==-- . That last one is starting to look like the php serialize() function :) a:1:{s:5:"Hello";s:5:"World";}
Jacob Stanley
A: 

Include the System.Linq library in your file and change your functions to this:

    public static string encode(System.Collections.Generic.List<string> data, out string delimiter)
    {
        delimiter = ":";
        while(data.Contains(delimiter)) delimiter += ":";
        return string.Join(delimiter, data.ToArray());
    }
    public static string[] decode(string encoded, string delimiter)
    {
        return encoded.Split(new string[] { delimiter }, StringSplitOptions.None);
    }
So the delimiter will have to be embedded along with the message ?
Sake
Yes, u include it with the message so it can be decoded. It solves both your problems you mentioned.
What if the end of one of the strings contains the delimiter, for example: "abc:def" or even just "abc:"?
Jacob Stanley
The while statement checks if the string contains the delimiter and modifies it until it is not found in the string.
+3  A: 

Add a reference and using to System.Web, and then:

public static string Encode(IEnumerable<string> strings)
{
    return string.Join("&", strings.Select(s => HttpUtility.UrlEncode(s)).ToArray());
}

public static IEnumerable<string> Decode(string list)
{
    return list.Split('&').Select(s => HttpUtility.UrlDecode(s));
}

Most languages have a pair of utility functions that do Url "percent" encoding, and this is ideal for reuse in this kind of situation.

Daniel Earwicker
LOL at the downvoter!
Daniel Earwicker
+1, This would be my preferred option.
LukeH
Mmmm, I could forgive someone for reinventing the wheel maybe, but reinventing string escaping is beyond the pale!
Daniel Earwicker
FWIW, I'm not the downvoter.
Sake
So the question is, why aren't you an upvoter? :)
Daniel Earwicker
Voted. BTW, the UrlEncode is absolutely not my favorite solution. It'd encode my native character into %NN.
Sake
Thanks. It sure would, but it depends what you want. If you want a human readable data stream for some specific set of characters, you have to write your own encode/decode. If you want a quick solution and don't care how the stream looks, use URL encoding.
Daniel Earwicker
plus one - simpler than the xml variant I did
ShuggyCoUk
+1  A: 

You shouldn't need to do this manually. As the other answers have pointed out, there are plenty of ways, built-in or otherwise, to serialize/deserialize.

However, if you did decide to do the work yourself, it doesn't require that much code:

public static string CreateDelimitedString(IEnumerable<string> items)
{
    StringBuilder sb = new StringBuilder();

    foreach (string item in items)
    {
        sb.Append(item.Replace("\\", "\\\\").Replace(",", "\\,"));
        sb.Append(",");
    }

    return (sb.Length > 0) ? sb.ToString(0, sb.Length - 1) : string.Empty;
}

This will delimit the items with a comma (,). Any existing commas will be escaped with a backslash (\) and any existing backslashes will also be escaped.

public static IEnumerable<string> GetItemsFromDelimitedString(string s)
{
    bool escaped = false;
    StringBuilder sb = new StringBuilder();

    foreach (char c in s)
    {
        if ((c == '\\') && !escaped)
        {
            escaped = true;
        }
        else if ((c == ',') && !escaped)
        {
            yield return sb.ToString();
            sb.Length = 0;
        }
        else
        {
            sb.Append(c);
            escaped = false;
        }
    }

    yield return sb.ToString();
}
LukeH
Wow ! It's amazing that you can code this instantly.
Sake
@Sake, I didn't knock it up instantly, although it didn't take too long. (I actually wrote it a few days ago as a response to a different SO question.)
LukeH
@Luke, But you'd prefer Earwicker solution ? Why ?
Sake
@Sake, As I said at the top of my answer, you shouldn't need to serialize/deserialize manually these days. URL encoding is tried-and-tested and (mostly) human-readable. And in most modern languages/frameworks it only takes a line or two of code.
LukeH
A: 

There are loads of textual markup languages out there, any would function

Many would function trivially given the simplicity of your input it all depends on how:

  1. human readable you want the encoding
  2. resilient to api changes it should be
  3. how easy to parse it is
  4. how easy it is to write or get a parser for it.

If the last one is the most important then just use the existing xml libraries MS supply for you:

class TrivialStringEncoder
{
    private readonly XmlSerializer ser = new XmlSerializer(typeof(string[]));

    public string Encode(IEnumerable<string> input)
    {
        using (var s = new StringWriter())
        {
            ser.Serialize(s, input.ToArray());
            return s.ToString();
        }    
    }

    public IEnumerable<string> Decode(string input)
    {
        using (var s = new StringReader(input))
        {
            return (string[])ser.Deserialize(s);      
        }    
    }

    public static void Main(string[] args)
    {
        var encoded = Encode(args);
        Console.WriteLine(encoded);
        var decoded = Decode(encoded);
        foreach(var x in decoded)
            Console.WriteLine(x);
    }
}

running on the inputs "A", "<", ">" you get (edited for formatting):

<?xml version="1.0" encoding="utf-16"?>
<ArrayOfString 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt;
  <string>A</string>
  <string>&lt;</string>
  <string>&gt;</string>
</ArrayOfString>
A
<
>

Verbose, slow but extremely simple and requires no additional libraries

ShuggyCoUk
+1  A: 

I would just prefix every string with its length and an terminator indicating the end of the length.

abc
defg
hijk
xyz
546
4.X

becomes

3: abc 4: defg 4: hijk 3: xyz 3: 546 3: 4.X

No restriction or limitations at all and quite simple.

Daniel Brückner
+2  A: 

Json.NET is a very easy way to serialize about any object you can imagine. JSON keeps things compact and can be faster than XML.

List<string> foo = new List<string>() { "1", "2" };
string output = JsonConvert.SerializeObject(foo);
List<string> fooToo = (List<string>)JsonConvert.DeserializeObject(output, typeof(List<string>));
Kurt