views:

58

answers:

1

I need to extract data from non delimited text files using C#. Basically, I need to remove all unwanted character then mark the end of a line and add a line break. Once the data has been separated into individual lines I need to loop through each line in turn and extract values using Regular Expressions. I have been doing this with Perl but now need to do it using C#. The raw file contains numerous line break characters throughout the file not jut at the end of a line as you would expect. I will be able to extract values using Regex objects but I am having trouble getting the file into a format that has each record on a line of its own.

A: 

You provided scarce information but. This code will create you List of lines.

Note that ReadLine will take a sequence of characters followed by a line feed ("\n"), a carriage return ("\r") or a carriage return immediately followed by a line feed ("\r\n").
I am not sure if this is the behaviour you expect.

    string fileName = "Text.txt";
    List<string> lines = new List<string>();
    using (StreamReader r = new StreamReader(fileName))
    {
        string line;
        while ((line = r.ReadLine()) != null)
        {
            lines.Add(line);
        }
    }

    foreach (string s in lines)
    {
        Console.WriteLine(s);
       //can do your Regex here
    }
Andrzej Nosal
OP says that a line doesn't necesarrily ends at a line break but ReadLine reads until next line break or EOF. I suspect that OP means \n when talking about line break in which case I'm affraid your code will fail.
Rune FS
Thanks for the quick response. The main problem is that the file contains lots of line breaks scattered throughout the text but I need to remove all of them, then find a Regex match that marks the end of the line. Once I have the data into a series of individual lines then I can use the ReadLine method to step through the file line by line applying Regex Matches. The real difficulty is getting the data separated into separate lines. I cannot supply the data as it in confidential material. Thanks again.
Freefall Steve
Maybe try load whole your file into a string(StreamReader:ReadToEnd()), then use string:Remove(x) to delete unwanted line breaks/characters, then string:Split(x) will split your string into array ( where x-delimiter).
Andrzej Nosal
Many thanks for the advice. I've have managed to read the entire file into a StreamReader using the ReadToEnd() method. I could remove the line breaks by the String.Remove() method. Could you please clarify how (and on what character) I could split the files into an Array? Thanks again.
Freefall Steve
You must decide on what character. This will print you ascii codes of string characters. Analyze your file(string) and see where do you want separate lines: var chArr = s.ToCharArray();for (int i = 0; i < chArr.Length; i++){ Console.Write((int)chArr[i]);}
Andrzej Nosal
Thanks again for the advice, I think I've cracked it now. I found the string that marked the end of a line, then I split into an array with the Regex.Split() method. I can now use a foreach loop to read through the array and use Regex Matches to extract the values I need. I must say I was very impressed by the quality of advice offered as well as the response time.
Freefall Steve