tags:

views:

29

answers:

3

I have a text file that contains a data like

ID        Name          Path                                    IsTrue        Period
1         "1 yr"        "C:\\Program Files\\My File.xyz"        -1            2"
1         "1 yr"        "C:\\Program Files\\My File.xyz"        -1            2"

now I have the following code to split the line

string[] ArrSeperators = { " " };
ArrSplitStrs = CurrStr.Split(ArrSeperators,
                             StringSplitOptions.RemoveEmptyEntries);

CurrStr represents each line of text file.

The problem is it split the name and path into multiple string but they must be treated as a single string. I cannot make any changes to file as it is a standard file across different products.

I am not getting what I can do.

A: 

If a tab separator is used to separate the fields you could use '\t'.

pika81
no they are separated by spaces and number of spaces between two data varies from section to section
Mohit
To capture the fields inside single/double quotes you could use: Regex regexp = new Regex("([\"\'])(?:\\\\\\1|.)*?\\1");
pika81
+1  A: 

Use an algorithm like this:

Process each character of each line one at a time.

Count every " that you find.

If the number of "s is odd, you know that you need to keep reading the current field until you hit another ".

If the number of "s is even, you know that as soon as you hit a space you're on to the next field.

Something like (this may have errors - I've just written it off the top of my head):

StringBuilder field = new StringBuilder();
int quoteCount = 0;

foreach (char c in line)
{
    if (c == '"')
    {
        quotCount++;
        continue;
    }

    if (quoteCount % 2 = 0)
    {
        if (c == ' ')
        {
            yield return field.ToString();
            field.Length = 0;
        }
        else
        {
            field.Append(c);
        }
    }
    else
    {
        field.Append(c);
    }
}

EDIT:

Here's a hacky example that works for your sample - the GetFields method needs some refactoring and it's far from the quality of anything I'd put in my code, but the basic principle is there.

class Program
{
    static void Main(string[] args)
    {
        var records = ReadFile(@"D:\x.txt");

        foreach (var record in records)
        {
            foreach (var field in record)
            {
                Console.Write(field + " | ");
            }

            Console.WriteLine();
        }

        Console.ReadKey();
    }

    static IEnumerable<IEnumerable<String>> ReadFile(String file)
    {
        using (var reader = new StreamReader(file))
        {
            // Ignore column titles line.
            reader.ReadLine();

            while (!reader.EndOfStream)
            {
                yield return GetFields(reader.ReadLine());
            }
        }
    }

    static IEnumerable<String> GetFields(String line)
    {
        Int32 quoteCount = 0;
        StringBuilder field = new StringBuilder();

        foreach (var c in line)
        {
            if (c == '"')
            {
                quoteCount++;
                continue;
            }

            if (quoteCount % 2 == 0)
            {
                if (c == ' ')
                {
                    if (field.Length > 0)
                    {
                        yield return field.ToString();
                        field.Length = 0;
                    }
                }
                else
                {
                    field.Append(c);
                }
            }
            else
            {
                field.Append(c);
            }
        }

        yield return field.ToString();
    }
}
Alex Humphrey
Thanks it works great I have made GetFields as an extension method so that i can use it anywhere. Thanks again...But, will there be any performance issues as its works on each character
Mohit
@Mohit: Measure the performance for a worst case scenario sized file and see. Just remember to call ToList() or ToArray() on the result of ReadFile - otherwise you'll just be testing the performance of creating an object that's going to produce your results at some point in the future, rather than testing the actual reading of the results!
Alex Humphrey
I have test with a really long file. Its performance is acceptable. Thanks for the suggestion (ToArray) I have already done that. Thanks again
Mohit
A: 

Try the following code. Tested with the sample provided in question...

string CurrStr = "1         \"1 yr\"        \"C:\\Program Files\\My File.xyz\"        -1            2\"";
string[] ArrSplitStrs = CurrStr.Split('"');
int HighestCount = ArrSplitStrs.Count() % 2 == 0 ? ArrSplitStrs.Count() : ArrSplitStrs.Count() - 1;
for (int Counter = 1; Counter < HighestCount; )
{
    Console.WriteLine(ArrSplitStrs[Counter]);
    Counter += 2;
}
Trinity
the output isArrSplitStrs[0] = "1 "ArrSplitStrs[0] = "1 yr"ArrSplitStrs[0] = " "ArrSplitStrs[0] = "C:\\Program Files\\My File.xyz"ArrSplitStrs[0] = " -1 2"ArrSplitStrs[0] = ""instead ofArrSplitStrs[0] = "1"ArrSplitStrs[1] = "1 yr"ArrSplitStrs[2] = "C:\Program Files\My File.xyz"ArrSplitStrs[3] = "-1"ArrSplitStrs[4] = "2"I want it into array as i have to process it further
Mohit
Take the new array and add the items that are being printed.
Trinity