tags:

views:

82

answers:

2

Hi,

I'm trying to parse some data returned by a 3rd party app (a TSV file). I have all the data neatly parsed into each fields (see http://stackoverflow.com/questions/2410788/parse-a-tsv-file), but I don't know how to format some fields.
Sometimes the data in a field is encapsulated like this:

=T("[FIELD_DATA]")

(That's some sort of Excel formatting I believe.)
When that happens, specific chars are escaped by CHAR(ASCII_NUM), and the reste of string is also encapsulated like in the above example, without the = which only appears at the beginning of the field.

So, has anyone an idea how I could parse fields that look like this:

=T("- Merge User Interface of Global Xtra Alert and EMT Alert")&CHAR(10)&T("- Toaster ?!")&CHAR(10)&T("")&CHAR(10)&T("")&CHAR(10)&T("None")&CHAR(10)&T("")&CHAR(10)&T("None")

(any number of CHAR/T() groups).

I have been thinking of regex or looping the string, but I doubt the validity of this. Help, anyone?

A: 
class Program
{
    public static void Main(string[] args)
    {
        var input = @"=T(""- Merge User Interface of Global Xtra Alert and EMT Alert"")&CHAR(10)&T(""- Toaster ?!"")&CHAR(10)&T("""")&CHAR(10)&T("""")&CHAR(10)&T(""None"")&CHAR(10)&T("""")&CHAR(10)&T(""None"")";
        var matches = Regex.Matches(input, @"T\(\""([^\""]*)\""\)");
        foreach (Match match in matches)
        {
            Console.WriteLine(match.Groups[1].Value);
        }            
    }
}
Darin Dimitrov
+1  A: 

I would go similarly to Darin, but his regex wasn't working for me. I would use this one:

(=T|&CHAR|&T)(\("*([A-Za-z?!0-9 -]*)"*\))+

You'll find that Groups[2] (remember zero offset on those) will be the data inside of the () and "" if the "" exist. For example this will find:

- Merge User Interface of Global Xtra Alert and EMT Alert

in:

=T("- Merge User Interface of Global Xtra Alert and EMT Alert")

and:

10

in:

&CHAR(10)

If you have:

&T("")

it will produce a null in Groups[2].

Hope this helps.

Tim C
Antoine