views:

115

answers:

2

I'm currently trying to solve a problem that's similar to http://stackoverflow.com/questions/2338716/testing-intersection-of-two-regular-languages with the exception that I know how to do the intersection, but have an additional requirement.

The intersection logic I intend to use is the Dragon Book's algorithm for converting an NFA to a DFA, but executed on two NFA's at the same time. Since all DFA's are NFA's (but with very little non-determinism), you can repeat this as needed for more intersections.

My problem is that one of my regexes has groups that can be used further on as a part of a new regex. Concretely:

bin/x86/a.out: obj/x86/.*\.o

obj/{[a-zA-Z0-9]+}/{.*}.o: src/\2.c

In the end of the first line I have a regex that matches all objects for x86 targets. In the second line I have a regex that specifies a possible build line, that should match the first group with the fixed "x86" and the second with any given string after it. In the example the first match isn't used yet, but it should be retrievable. To make sure that the matching ends (and to allow recursive rules), I want to use the information gained from the first regex in matching the second. The rule is selected by taking the second regex from the first and the first from the second line and to determine if the intersection of the two (the DFA resulting from the intersection) has an accepting state. If it does, there are sentences that both can parse and therefore some values that the group can take.

Is it possible, in general, to extract information from the first regex for use in matching the group of the second regex?

If not in general, what kinds of restrictions do I need to add?

A: 

I believe back-ticks make the language non-regular, so you won't be able to convert it to a finite-automoton.

Douglas Leeder
The back-references (I think you mean the \1 etc.) are replaced with the sub-NFA or sub-regex from the origin before being converted to an NFA themselves. That makes them regular again. The thing needed for that is the sub-NFA of the group from the left hand side, which is exactly what I am asking for in this question. Taking the full group may be too large (and prevent recursion), reducing it with the second regex is this question.
dascandy
Thinking about it a bit more, you may just have a point. I had assumed the transformation from NFA to DFA to be valid for a modified NFA that would also match the group, but it isn't. The trivial a*{a*} would have trouble being matched and can match a string of a's in multiple ways, with in particular varying what's matched inside the group. So, there's no way to match that not knowing what's behind this part of it.I'll keep the thinking cap on; maybe some other solution will pop up.
dascandy
A: 

Why do your examples look like Makefile rules, even though Makefiles don't support regular expressions?

Because that's the thing I'm trying to make (no pun intended).

Which regex library do you use?

None, as of yet. I'm considering to write my own based on the output of this question. If this isn't possible, I may make do with an existing one that supports this. If this is theoretically possible, I'll develop my own to do exactly this & make the application as I intend it.

Some support lookahead expressions, which are another way of expression intersections

The idea behind the intersection is to define rules that are generic and can contain multiple varying left-side parts (the use of % in usual makefiles, but without the requirement to do some sort of recursive make if you do have more than one variation point - such as the platform, build type or file name). If I can't take the second regex into account for the group I can't use such a rule recursively because the recursion wouldn't have any change between each step/level. That would reduce the genericity but would still be acceptable. Still, it's an interesting question to know the answer to (IE, the can it be done generically) and it'll be deciding in the requirements I'll have for a regex library.

(Not posted as original author because I lost my cookie & am waiting for the accounts to be merged).

dascandy