tags:

views:

110

answers:

5

What would be the fastest way to store List of type Point, in a string that produces minimum string length and with fastest parse back algorithm?

I found that framework has Convert.ToBase64String, Convert.FromBase64String methods. Open to any ideas using these or even better self designed algorithms ;)

Thanks in advance

C#, vs2005 (.net 2.0)

-Edit-

I will use the code in an activeX component and i cant add another library just for this purpose.

+1  A: 

How about to use JSON?

abatishchev
Yeah, JSON is simple to parse and serialize. It doesnt add much of extra characters to the string either. Codeplex has some good serializers/deserializers.
jgauffin
+1  A: 

Just use ProtoBuf.Net.

Oliver
+1  A: 

In all honesty, it is near impossible to answer this authoritatively. Much depends on how large the list of points are, whether it really needs to be a string, and what aspect you are trying to optimize for. For example, if you need raw speed you may trade RAM for processing time, but if you need throughput you need to have an algorithm that doesn't consume too many resources.

The most compact and quickest way to store a list of anything and restore the list later is to use binary serialization. Of course this leaves you with the risk that a change in the CLR can render the file unuseable. For anyone who is interested, xml serialization is neither efficient from speed nor from space conciderations but the format can be read by other CLRs with no change.

Base64 algorithms are fairly efficient, and use a code-table lookup algorithm that is very quick. Base64 encoding the binary format may yield very good results. But if you don't need to store it as a string, why go through the extra trouble?

CORBA also defines a binary to string algorithm that has to be efficient for the types of things it is trying to do. If I recall correctly it uses a 128 symbol code table (i.e. 128 bit encoding) so it is more compact than base 64.

In the end, you are going to have to run some tests. Before you start your tests, you have to know when the algorithm is good enough. Just how small does the string size have to be before it is acceptable? Just how fast does the parsing algorithm have to be before it is acceptable. And just how many of these do you need to parse simultaneously? Only you can determine that.

Berin Loritsch
+2  A: 

To String:

MYSTRING = string.Join("", list.Select( 
     point => point.x.toString("X").PadLeft(4, '0') +
              point.y.toString("X").PadLeft(4, '0')).toArray() )

From String:

new Regex("(.{8})").Split(MYSTRING).Where(x => !string.IsNullOrEmpty(x)).
     Select(x=> new Point(x.Take(4), x.Skip(4).Take(4)))
Draco Ater
Don't forget to add culture-specific digit separator support
abatishchev
Thanks for the answer but linq not supported in .net 2.0, and i may have 1 to 5 digit lengths which will lead to waste of space coused by padding.
honibis
Then you should use some separators between Points and x and y inside point. Just test, which produces lesser string in your case.
Draco Ater
And it's and algorithm, you can rewrite it in .Net2.0 syntax easily.
Draco Ater
+2  A: 

Use hex representation of ints, it reduces size of the string:

Serialize:

List<Point> list = new List<Point>(new Point[] {new Point(1, 2), new Point(10, 20), new Point (100, 200), new Point(1000, 2000), new Point(10000, 20000)});

// 1. To.
StringBuilder sb = new StringBuilder();
foreach (Point point in list)
{
    sb.Append(Convert.ToString(point.X, 16));sb.Append('.');
    sb.Append(Convert.ToString(point.Y, 16));sb.Append(':');
}

string serialized = sb.ToString(); 

Here is the string in form: "x.y:1.2:a.14:64.c8:3e8.7d0:2710.4e20:"

Deserialize, splitting ('serialized' is the string contains chain of numbers):

string[] groups = serialized.Split(new char[] {':'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string group in groups)
{
    string[] coords = group.Split('.');
    restored.Add(new Point(Convert.ToInt32(coords[0], 16), Convert.ToInt32(coords[1], 16)));
}

Or you could you regexp to parse groups("[0-9a-fA-F].[0-9a-fA-F]"), it's up to you. I am not sure which is quicker.

Or a simple state machine (just for fun):

List<Point> restored = new List<Point>();
string value = default(string);
int left = 0;
int x = 0, y = 0;
for (int i = 0; i < serialized.Length; i++)
{
    if (serialized[i] == '.')
    {
        value = serialized.Substring(left, i - left);
        left = i + 1;
        x = Convert.ToInt32(value, 16); 
    }
    else if (serialized[i] == ':')
    {
        value = serialized.Substring(left, i - left);
        left = i + 1;
        y = Convert.ToInt32(value, 16);
        restored.Add(new Point(x, y));
    }
}

IMHO.

EDITED: Or even better to pack integers to groups of hex: 1212 to 'CC' like it is used in old financial systems; it makes length of string two times less.

Dmitry Karpezo
+1 I was actually looking for some ideas like this. Using string more efficiently with hex is good point.
honibis