views:

129

answers:

3

I have a structured file with hierarchical text which describes a GUI in Delphi (a DFM-File).

Let's assume I have this file and I have to match all "Color = xxx" Lines, which are in the context of TmyButton (marked), but not those in other context. Within the TMyButton-Context there won't be a deeper hierarchical level.

object frmMain: TfrmMain
  Left = 311
  Top = 201
  Color = clBtnFace
  object MyFirstButton: TMyButton
    Left = 555
    Top = 301
    Color = 16645072           <<<<<<MATCH THIS
    OnClick = ButtonClick
  end
  object MyLabel: TLabel
    Left = 362
    Top = 224
    Caption = 'a Caption'
    Color = 16772831
    Font.Color = clWindowText
  end
  object Panel2: TLTPanel
    Left = 348
    Top = 58
    Width = 444
    Height = 155
    Color = clRed
    object MyOtherButton: TMyButton
      Left = 555
      Top = 301
      Color = 16645072         <<<<<<MATCH THIS
      OnClick = ButtonClick
    end
  end
end

I tried it two days long with many, many different tries. Here some of my incomplete pieces of the pattern:

/^[ ]{2,}object [A-Za-z0-9]+: TmyButton\r\n/mi  <<<Matches the needed context
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/mi            <<<Matches the needed result
/^[ ]{2,}end\r\n/mi                             <<<Matches the end of the context

(I don't know why, but I had to use "\r\n" instead of "$"...). I need to put this together, but ignoring the other lines except other "object xxx: yyy" and "end" Lines....

I would be glad to have some help!

+1  A: 

If I understand you correctly, you try to create single regexp for this. There is no reason to do so.

  1. Just find line with pattern object [A-Za-z0-9]+: TmyButton
  2. Then check each next line against Color = [A-Za-z0-9]+ until you find it or reach end keyword.
  3. Repeat steps until end of file

If you try to modify a bulk of source files, you could use some scripting for this purpose.

Rorick
+1  A: 

Matching a line in a complex context requires a regex feature called lookaround, if you want or have to do it with a single regex. Specifically, you'd need variable-length lookbehind which PCRE doesn't offer.

So there are two possibilities: Use a scripting approach like Rorick suggested or use a regex that matches everything from the start of your needed context until the actual match, and extract that using a capturing group. That could be done with

[ ]{2,}object \w+: TMyButton\r\n.*?^([ ]{4,}Color = \w+[ \t]*\r\n)

(brackets around the space inserted for clarity). Your match would then be in capturing group \1

Nested structures generally are not well suited for regexes (better for parsers) but if you're sure of the structure of your data as you mentioned, it might work OK.

Tim Pietzcker
variable-length lookbehind: thats what I tried fist... I think I need to do some scripting. Now I will search first for the context using /^[ ]{2,}object \w+: TMyButton\r\n(^[ ]{4,}.+\r\n)+^[ ]{2,}end\r\n/mi and then within this match search for /^[ ]{4,}Color = \w+\r\n/mi
knight_killer
+1  A: 

I know this is not PCRE, but a good alternative for software archeology.

You could at any time use AWK, if you do this from a command prompt. The script would look like this:

BEGIN       { inObj = 0; } // Not really necessary
/TMyButton/ { inObj = 1; }
/end$/      { inObj = 0; }
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/ && inObj == 1
            { //do whatever you need to do
              print $3;
            }

AWK can be found all over the internet. I would try GAWK.

Ralph Rickenbach