views:

261

answers:

3

Hi,

What would be the best approach to parse a delimited file when the columns are unknown before parsing the file?

The file format is Rightmove v3 (.blm), the structure looks like this:

#HEADER#
Version : 3
EOF : '^'
EOR : '~'
#DEFINITION#
AGENT_REF^ADDRESS_1^POSTCODE1^MEDIA_IMAGE_00~ // can be any number of columns
#DATA#
agent1^the address^the postcode^an image~
agent2^the address^the postcode^^~      // the records have to have the same number of columns as specified in the definition, however they can be empty
etc
#END#

The files can potentially be very large, the example file I have is 40Mb but they could be several hundred megabytes. Below is the code I had started on before I realised the columns were dynamic, I'm opening a filestream as I read that was the best way to handle large files. I'm not sure my idea of putting every record in a list then processing is any good though, don't know if that will work with such large files.

List<string> recordList = new List<string>();

try
{
    using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
    {
        StreamReader file = new StreamReader(fs);
        string line;
        while ((line = file.ReadLine()) != null)
        {
            string[] records = line.Split('~');

            foreach (string item in records)
            {
                if (item != String.Empty)
                {
                    recordList.Add(item);
                }
            }

        }
    }
}
catch (FileNotFoundException ex)
{
    Console.WriteLine(ex.Message);
}

foreach (string r in recordList)
{
    Property property = new Property();

    string[] fields = r.Split('^');

    // can't do this as I don't know which field is the post code
    property.PostCode = fields[2];
    // etc

    propertyList.Add(property);
}

Any ideas of how to do this better? It's C# 3.0 and .Net 3.5 if that helps.

Thanks,

Annelie

A: 

You could do this a few ways.

  1. If the properties on your objects have the same name as the columns in your data file, you could use reflection to determine which columns should be matched to which properties.
  2. If the properties on your objects have different names, then you could write a custom mapping schema that would say "for column X, assign to property Y".
  3. You could create custom attributes for your object properties that indicate which column name they map to, and use reflection to read those attributes.

All of these steps presuppose that the column names in your data files will be the same for the data they represent (i.e., ADDRESS_1 will always be the column name for "address line one" data).

Nicholas Cloud
They'd have different names to option 2 would be the way to go, I'll have a look at how to do that. Thanks!
annelie
A: 

If you can strip out some of the lines at the start (the header content, and the #xxx# lines) then it's just a csv file with ^ as the delimiter, so any CSV reader class will do the trick.

Richie Cotton
I'm sure I can strip those out, I'll have a look at the link, thanks!
annelie
A: 

Hi Annelie,

      i have to use that type of rightmove file. can you tell me how you send request for that? i mean is there any soap message or symply have to make URL connection.

Thanks.

Niketa
It's just a file format, so your question doesn't make much sense I'm afraid (and if you're still having trouble, please post it as a separate question rather than as an answer to my question).
annelie