views:

506

answers:

3

Hi all,

Having a few problems trying to parse a CSV in the following format using the FileHelpers library. It's confusing me slightly because the field delimiter appears to be a space, but the fields themselves are sometimes quoted with quotation marks, and other times by square brackets. I'm trying to produce a RecordClass capable of parsing this.

Here's a sample from the CSV:

xxx.xxx.xxx.xxx - - [14/Jun/2008:18:04:17 +0000] "GET http://www.some_url.com HTTP/1.1" 200 73662339 "-" "iTunes/7.6.2 (Macintosh; N; Intel)"

It's an extract from an HTTP log we receive from one of our bandwidth providers.

Any help would be greatly appreciated.

Thanks,

Richard.

A: 

In what way is that CSV? It looks like it's just a particular log file format which should be fairly easily parsed, but not by a CSV parser. In particular, you may well find that a regex works perfectly well. (You'd need to check what would happen to quotes in the user agent etc.)

Jon Skeet
My mistake, stuck in CSV mode today as that's what I've been dealing with all morning. FileHelpers says that it reads "data from fixed length or delimited records in files"; I presumed this is delimited (by spaces), but that it has different field quotes. I'll look into a regex, thanks.
Richard
+1  A: 

The obvious statement is "then it isn't CSV"...

I'd be tempted to use a quick regex to munge the date into the same escaping as everything else... on a line-by-line basis, something like:

string t = Regex.Replace(s, @"\[([^\]]*)\]", @"""$1""")

Then you should be able to use a standard parser using space as a delimiter (respecting quotes).

Marc Gravell
A: 

Hi,

While I thank Marc Gravell and Jon Skeet for their input, my question was how to go about parsing a file containing lines in the format described using the FileHelpers library (albeit, I worded it badly to begin with, describing 'CSV' when in fact, it isn't).

I have now found a way to do just this. It's not particularly the most elegant method, however, it gets the job done. In an ideal world, I wouldn't be using FileHelpers in this particular implementation ;)

For those who are interested, the solution is to create a FileRecord class as follows:

[DelimitedRecord(" ")]
public sealed class HTTPRecord
{

public String IP;

// Fields with prefix 'x' are useless to me... we omit those in processing later
public String x1;
[FieldDelimiter("[")]
public String x2;


[FieldDelimiter("]")]
public String Timestamp;

[FieldDelimiter("\"")]
public String x3;

public String Method;
public String URL;

[FieldDelimiter("\"")]
public String Type;

[FieldIgnored()]
public String x4;

[FieldDelimiter(" ")]
public String x5;

public int HTTPStatusCode;

public long Bytes;

[FieldQuoted()] 
public String Referer;

[FieldQuoted()] 
public String UserAgent;
}

Cheers

Richard.

Richard