views:

256

answers:

2

I've got a text file that arrives at my application as many lines of the following form:

<row amount="192.00" store="10" transaction_date="2009-10-22T12:08:49.640"
 comp_name="blah                                            " 
 comp_ref="C65551253E7A4589A54D7CCD468D8AFA" 
 name="Accrington                                                  "/>

and I'd like to turn this 'row' into a series of name/value pairs in a given TStringList (there could be dozens of these <row>s in the file, so eventually I will want to iterate through the file breaking each row into name/value pairs in turn).

The problem I've got is that the data isn't obviously delimited (technically, I suppose it's space delimited). Now if it wasn't for the fact that some of the values contain leading or trailing spaces, I could probably make a few reasonable assumptions and code something to break a row up based on spaces. But as the values themselves may or may not contain spaces, I don't see an obvious way to do this. Delphi' TStringList.CommaText doesn't help, and I've tried playing around with Delimiter but I get caught-out by the spaces inside the values each time.

Does anyone have a clever Delphi technique for turning the sample above into something resembling this? ;

amount="192.00"
store="10"
transaction_date="2009-10-22T12:08:49.640"
comp_name="blah                                            " 
comp_ref="C65551253E7A4589A54D7CCD468D8AFA" 
name="Accrington                                                  "

Unfortunately, as is usually the case with this kind of thing, I don't have any control over the format of the data to begin with - I can't go back and 'make' it comma delimited at source, for instance. Although I guess I could probably write some code to turn it into comma delimited - would rather find a nice way to work with what I have though.

This would be in Delphi 2007, if it makes any difference.

+3  A: 
procedure RowToStrings(const row: string; list: TStrings);
var
  i       : integer;
  iDelim  : integer;
  inQuotes: boolean;
begin
  iDelim := 0;
  inQuotes := false;
  for i := 1 to Length(row) do begin
    if (row[i] = ' ') and (not inQuotes) then begin
      list.Add(Copy(row, iDelim+1, i-iDelim-1));
      iDelim := i;
    end
    else if row[i] = '"' then
      inQuotes := not inQuotes;
  end;
  list.Add(Copy(row, iDelim+1, Length(row)-iDelim));
end;

procedure TForm37.Test;
var
  row: string;
begin
  row := 'amount="192.00" store="10" transaction_date="2009-10-22T12:08:49.640" ' +
         'comp_name="blah                                            " '          +
         'comp_ref="C65551253E7A4589A54D7CCD468D8AFA" '                           +
         'name="Accrington                                                  "';
  RowToStrings(row, ListBox1.Items);
end;
gabr
Wow - that was embarrassingly straightforward... thanks! For some reason I assumed that writing a state-machine (kind of thing) to parse the row character by character would have more edge cases than that. Thanks Gabr!
robsoft
+12  A: 

You say it's not "obviously delimited," but to me, it's very obviously delimited because it's very obviously XML. So use an XML parser. You could start with Delphi's TXmlDocument. You could pass each "row" string to the parser separately, but my suspicion is that all those rows are enclosed by some other angle-bracket tag. Feed that entire file to the parser, and it can help you get a list of objects representing rows, and then you can ask for the values of their attributes by name.

If you try to parse XML without regard to the nuances of XML parsing, sooner or later you're going to get burned.

Rob Kennedy
+1 Thanks Rob - you're right of course, it's a snippet of some XML output from a SQL Server somewhere. I didn't want to go to the trouble of turning it into a dataset/XML document but your final sentence has the ring of truth about it, and sure as eggs sooner or later they'll send me something I won't parse properly. I've accepted Gabr's answer because he did actually help me but +1 and kudos to you for making a good point. I'll have a look at what TXmlDocument makes of it. :-)
robsoft