tags:

views:

110

answers:

2

How can i parse the following Content?

data (DIR1:input bit;
      DGG2:input bit;
      OENEG1:input bit;
      OE_NEG2:input bit;
      A1:inputoutput bit_vector(1 to 9);
      A2,H5,J7:inputoutput bit_vector(1 to 9);
      B1,E4,Y7:inputoutput bit_vector(1 to 9);
      B2:inputoutput bit_vector(1 to 9);
                TGY:output bit;
      THHH, Tff, TsD:input bit);

I want the output in dictionary which is shown below

 Dictionary<string,string> l_dictData = new Dictionary<string,string>();

After parsing the l_dictData should be filled with the result :

 l_dictData["inputbit"] = "DIR1,DGG2,OENEG1,OE_NEG2,THHH,Tff,TsD";

 l_dictData["inputoutputbit"] = "A1(1),A1(2),....,A1(9)A2(1),A2(2)....A2(9),H5(1)....H5(9),J7(1),...J7(9),B1(1),....B1(9),E4(1),....E4(9),Y7(1),...Y7(9),B2(1),....B2(9)";

 l_dictData["outputbit"] = "TGY";

Here is my Regular Expression

    1. ([ \t\r\n]*)?(data|DATA)([ \t\r\n]*)?(\()?
    2.  "[ \t\r\n]*(?<PINFUNC>(inputbit|outputbit|inputoutputbit))(_vector[ \t\r\n]*\([ \t\r\n]*(?<START>([0-9]+))[ \t\r\n]*(to|downto)[ \t\r\n]*(?<END>([0-9]+))[ \t\r\n]*\))?

NOTE:

The Text before the ":"(Semi colon is taken as Value for the Dictionary)

Please let me know if u have any queries

+4  A: 

I wouldn't use regular expressions. I'd do the following:

  1. Filter out the contents of the brackets.
  2. Split your string on ; to get individual values.
  3. Create a holding object which is someting like dictionary>
  4. Loop through each of your name/value things (eg "DIR1:input bit") and split on :
  5. Work out your key and value (Your keys don't seem to exactly match what is after the ":"
  6. If key is in dictionary then add the value to the list, if key is not yet there then you need to create the string list first.
  7. Finish looping with your dictionary referring to lists of values.
  8. Loop through your new dictionary and write the values into your final dictionary by just converting the list into a single string.
  9. Profit.

Oh, and you might need some trim() in there to deal with your whitespace.

Chris
+1 Regex can be at the best used on a couple of steps mentioned here. That's a lot of steps to profit I say.
Amarghosh
This is the same angle I would attack the problem from. I would probably end up doing it using TDD, so I'm not sure what the final solution would look like, but I don't think I would start with regex.
ckramer
+1  A: 

This expression: (?:\(|\s)\s*([\w| |,]*):(\w*?) bit.*?;

yields these results:

[1] => Array
    (
        [0] => DIR1
        [1] => DGG2
        [2] => OENEG1
        [3] => OE_NEG2
        [4] => A1
        [5] => A2,H5,J7
        [6] => B1,E4,Y7
        [7] => B2
        [8] => TGY
        [9] => THHH, Tff, TsD
    )

[2] => Array
    (
        [0] => input
        [1] => input
        [2] => input
        [3] => input
        [4] => inputoutput
        [5] => inputoutput
        [6] => inputoutput
        [7] => inputoutput
        [8] => output
        [9] => input
    )

Split on commas, trim spaces, add "bit" to the key and you're done.

With thanks to My Regex Tester (which will also explain this if you ask it to): http://www.myregextester.com/index.php

Lunivore
Witchcraft I say! +1
Chris Marisic
`[\w| |,]` - the `|` is not a metacharacter in a character class definition.
polygenelubricants
Meh, whatever means "or" in C#'s version then.
Lunivore
It doesn't matter what regex flavor you're using, "or" is implied in character classes. It's just `[\w ,]`.
Alan Moore
Tried that, didn't work. Maybe I was doing something else wrong. Chris - I'd trust these people before I trust me!
Lunivore