ansaurus

Question

Answer 1

+1 A:

30k is not what I would consider to be large. Before getting excited, I would profile. The indexer should be fine for the best balance of flexibility and safety.

For example, to create a 128k string (and a separate array of the same size), fill it with junk (including the time to handle Random) and sum all the character code-points via the indexer takes... 3ms:

        var watch = Stopwatch.StartNew();
        char[] chars = new char[128 * 1024];
        Random rand = new Random(); // fill with junk
        for (int i = 0; i < chars.Length; i++) chars[i] =
             (char) ((int) 'a' + rand.Next(26));

        int sum = 0;
        string s = new string(chars);
        int len = s.Length;
        for(int i = 0 ; i < len ; i++)
        {
            sum += (int) chars[i];
        }
        watch.Stop();
        Console.WriteLine(sum);
        Console.WriteLine(watch.ElapsedMilliseconds + "ms");
        Console.ReadLine();

For files that are actually large, a reader approach should be used - StreamReader etc.

Marc Gravell 2010-07-01 12:47:32

Or if you put the Stopwatch.StartNew under new string(chars) = 0 ms

simendsjo 2010-07-01 12:58:29

Thanks Marc. Right 30k isn't large, but I meant it's not like a one line string, or say converting an string to integer. Certainly small enough to fit in memory.3ms sounds good, but I guess I'm just going to have to profile and compare.

cantabilesoftware 2010-07-01 13:08:39

Answer 2

+1 A:

"Parsing" is quite an inexact term. Since you talks of 30k, it seems that you might be dealing with some sort of structured string which can be covered by creating a parser using a parser generator tool.

A nice tool to create, maintain and understand the whole process is the GOLD Parsing System by Devin Cook: http://www.devincook.com/goldparser/

This can help you create code which is efficient and correct for many textual parsing needs.

As for your points:

is usually not useful for parsing which goes further than splitting a string.
is better suited if there are no recursions or too complex rules.
is basically a no-go if you haven't really identified this as a serious problem. The JIT can take care of doing the range checks only when needed, and indeed for simple loops (the typical for loop) this is handled pretty well.

Lucero 2010-07-01 12:53:23

Thanks Lucero. Gold look pretty good, but not suitable for what I'm doing (very whitespace dependent and not strictly defined).

cantabilesoftware 2010-07-01 13:12:52

cantabilesoftware, without better knowledge of what you're actually trying to do it is quite hard to make meaningful suggestions.

Lucero 2010-07-01 13:57:26

ansaurus

tags:

views:

answers:

Fast string parsing in C#

related questions