I need to extract data from non delimited text files using C#. Basically, I need to remove all unwanted character then mark the end of a line and add a line break. Once the data has been separated into individual lines I need to loop through each line in turn and extract values using Regular Expressions. I have been doing this with Perl but now need to do it using C#. The raw file contains numerous line break characters throughout the file not jut at the end of a line as you would expect. I will be able to extract values using Regex objects but I am having trouble getting the file into a format that has each record on a line of its own.
A:
You provided scarce information but. This code will create you List of lines.
Note that ReadLine will take a sequence of characters followed by a line feed ("\n"), a carriage return ("\r") or a carriage return immediately followed by a line feed ("\r\n").
I am not sure if this is the behaviour you expect.
string fileName = "Text.txt";
List<string> lines = new List<string>();
using (StreamReader r = new StreamReader(fileName))
{
string line;
while ((line = r.ReadLine()) != null)
{
lines.Add(line);
}
}
foreach (string s in lines)
{
Console.WriteLine(s);
//can do your Regex here
}
Andrzej Nosal
2010-09-24 10:37:39
OP says that a line doesn't necesarrily ends at a line break but ReadLine reads until next line break or EOF. I suspect that OP means \n when talking about line break in which case I'm affraid your code will fail.
Rune FS
2010-09-24 10:48:16
Thanks for the quick response. The main problem is that the file contains lots of line breaks scattered throughout the text but I need to remove all of them, then find a Regex match that marks the end of the line. Once I have the data into a series of individual lines then I can use the ReadLine method to step through the file line by line applying Regex Matches. The real difficulty is getting the data separated into separate lines. I cannot supply the data as it in confidential material. Thanks again.
Freefall Steve
2010-09-24 11:16:26
Maybe try load whole your file into a string(StreamReader:ReadToEnd()), then use string:Remove(x) to delete unwanted line breaks/characters, then string:Split(x) will split your string into array ( where x-delimiter).
Andrzej Nosal
2010-09-24 12:25:48
Many thanks for the advice. I've have managed to read the entire file into a StreamReader using the ReadToEnd() method. I could remove the line breaks by the String.Remove() method. Could you please clarify how (and on what character) I could split the files into an Array? Thanks again.
Freefall Steve
2010-09-24 12:54:00
You must decide on what character. This will print you ascii codes of string characters. Analyze your file(string) and see where do you want separate lines: var chArr = s.ToCharArray();for (int i = 0; i < chArr.Length; i++){ Console.Write((int)chArr[i]);}
Andrzej Nosal
2010-09-25 05:12:13
Thanks again for the advice, I think I've cracked it now. I found the string that marked the end of a line, then I split into an array with the Regex.Split() method. I can now use a foreach loop to read through the array and use Regex Matches to extract the values I need. I must say I was very impressed by the quality of advice offered as well as the response time.
Freefall Steve
2010-09-26 17:48:10