views:

179

answers:

4

I have a program that generates a plain text file. The structure (layout) is always the same. Example:


Text File:

LinkLabel
"Hello, this text will appear in a LinkLabel once it has been
added to the form. This text may not always cover more than one line. But will always be surrounded by quotation marks."
240, 780

So, to explain what is going on in that file:

Control
Text
Location


And when a button on the Form is clicked, and the user opens one of these files from the OpenFileDialog dialog, I need to be able to Read each line. Starting from the top, I want to check to see what control it is, then starting on the second line I need to be able to get all text inside the quotation marks (regardless of whether is is one line of text or more), and on the next line (after the closing quotation mark), I need to extract the location (240, 780)... I have thought of a few ways of going about this but when I go to write it down and put it to practice, it doesn't make much sense and end up figuring out ways that it won't work.

Has anybody ever done this before? Would anybody be able to provide any help, suggestions or advice on how I'd go about doing this?

I have looked up CSV files but that seems too complicated for something that seems so simple.

Thanks jase

+2  A: 

I'll try and write down the algorithm, the way I solve these problems (in comments):

// while not at end of file
  // read control
  // read line of text
  // while last char in line is not "
    // read line of text
  // read location

Try and write code that does what each comment says and you should be able to figure it out.

HTH.

Jonathan van de Veen
and also you need to include logic for dealing with cases such as not having a terminating ". All doable but tedious code to write.
djna
In this case, looking at the example, I can't see how you would detect that a terminating " is missing in any reliable way. The only way to actually say something is wrong is that you reach the end of the file while still reading the Text part. Who knows how many controls you've interpreted as text by then. The only way to improve on that is to improve on the file format.
Jonathan van de Veen
+2  A: 

You could use a regular expression to get the lines from the text:

MatchCollection lines = Regex.Matches(File.ReadAllText(fileName), @"(.+?)\r\n""([^""]+)""\r\n(\d+), (\d+)\r\n");
foreach (Match match in lines) {
   string control = match.Groups[1].Value;
   string text = match.Groups[2].Value;
   int x = Int32.Parse(match.Groups[3].Value);
   int y = Int32.Parse(match.Groups[4].Value);
   Console.WriteLine("{0}, \"{1}\", {2}, {3}", control, text, x, y);
}
Guffa
Thank you very much Guffa, you've helped me tremendously! much appreciated!!!!! :D :D
baeltazor
+1  A: 

This kind of stuff gets irritating, it's conceptually simple, but you can end up with gnarly code. You've got a comparatively simple case:one record per file, it gets much harder if you have lots of records, and you want to deal nicely with badly formed records (consider writing a parser for a language such as C#.

For large scale problems one might use a grammar driven parser such as this: link text

Much of your complexity comes from the lack of regularity in the file. The first field is terminated by nwline, the second by delimited by quotes, the third terminated by comma ...

My first recomendation would be to adjust the format of the file so that it's really easy to parse. You write the file so you're in control. For example, just don't have new lines in the text, and each item is on its own line. Then you can just read four lines, job done.

djna
+2  A: 

You are trying to implement a parser and the best strategy for that is to divide the problem into smaller pieces. And you need a TextReader class that enables you to read lines.

You should separate your ReadControl method into three methods: ReadControlType, ReadText, ReadLocation. Each method is responsible for reading only the item it should read and leave the TextReader in a position where the next method can pick up. Something like this.

public Control ReadControl(TextReader reader)
{
    string controlType = ReadControlType(reader);
    string text = ReadText(reader);
    Point location = ReadLocation(reader);
    ... return the control ...
}

Of course, ReadText is the most interesting one, since it spans multiple lines. In fact it's a loop that calls TextReader.ReadLine until the line ends with a quotation mark:

private string ReadText(TextReader reader)
{
    string text;
    string line = reader.ReadLine();
    text = line.Substring(1); // Strip first quotation mark.
    while (!text.EndsWith("\"")) {
        line = reader.ReadLine();
        text += line;
    }
    return text.Substring(0, text.Length - 1); // Strip last quotation mark.
}
Ronald Wildenberg
thank you very much rwwilden. i really appreciate your answer thanks :) Although I didn't use your code, it got me thinking more clearly, and i'm about to edit my question to include the solution i came up with.
baeltazor