views:

468

answers:

4

Using the .NET MicroFramework which is a really cut-down version of C#. For instance, System.String barely has any of the goodies that we've enjoyed over the years.

I need to split a text document into lines, which means splitting by \r\n. However, String.Split only provides a split by char, not by string.

How can I split a document into lines in an efficient manner (e.g. not looping madly across each char in the doc)?

P.S. System.String is also missing a Replace method, so that won't work.
P.P.S. Regex is not part of the MicroFramework either.

A: 

What about:

string path = "yourfile.txt";
string[] lines = File.ReadAllLines(path);

Or

string content = File.ReadAllText(path);
string[] lines = content.Split(
    Environment.NewLine.ToCharArray(),
    StringSplitOptions.RemoveEmptyEntries);

Readind that .NET Micro Framework 3.0, this code can work:

string line = String.Empty;
StreamReader reader = new StreamReader(path);
while ((line = reader.ReadLine()) != null)
{
    // do stuff
}
Rubens Farias
It splits by both items in the CharArray separately. Thus, you'd get a bunch of empty results in addition to legit results.
AngryHacker
@AngryHacker: RemoveEmptyEntries should deal with them;
Rubens Farias
File.ReadAllLines is not supported either.
AngryHacker
See the link the provided in the question for documentation on what's supported.
AngryHacker
+5  A: 

I would loop across each char in the document, because that's clearly required. How do you think String.Split works? I would try to do so only hitting each character once, however.

Keep a list of strings found so far. Use IndexOf repeatedly, passing in the current offset into the string (i.e. the previous match + 2).

Jon Skeet
True, but the full .NET implementation of Split(string[]) uses pointers and unsafe code to achieve the performance that it does. Otherwise, I would simply copy the code. I am operating on a really low end chip and was hoping for something exceedingly clever.
AngryHacker
If you're only splitting by a single delimiter string, and the delimiter is longer than 3 character, you can get better performance using one of the Boyer-Moore search algorithm variants. Not that it helps the poster's problem, in this case. http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
LBushkin
+2  A: 

How can I split a document into lines in an efficient manner (e.g. not looping madly across each char in the doc)?

How do you think the built-in Split works?

Just reimplement it yourself as an extension method.

Anon.
+4  A: 

You can do

string[] lines = doc.Split('\n');
for (int i = 0; i < lines.Length; i+= 1)
   lines[i] = line[i].Trim();

Assuming that the µF supports Trim() at all. Trim() will remove all whitespace, that might be useful. Also consider TrimEnd('\r')

Henk Holterman
Yep, this does the trick and keeps most of the performance characteristics. Thanks. Pretty clever.
AngryHacker
I was just about to suggest something like this - beat me to the punch. You may need to recombine strings (theoretically) if '\n' is not always preceded by '\r' in the input.
LBushkin