views:

65

answers:

2

Hi All,

I was wondering, what would be the best data structure to represent a DFA ?

I am looking at converting a regular expression to a DFA and make this particular functionality as a library in java.

The main thing is that, each entity in the regex carries a set of value rather than a single string value like "car" . In my case , each entity would carry many properties like {car,Honda,4x4,sedan ... } ( Though I am not searching for cars, this is just an example)

Any suggestions ? Doubts if any please ask. Thanks

A: 

A web search will yield some examples of DFAs in Java. However, the best representation depends on your specific application requirements; e.g. how your application is going to use the DFAs. I think you need to work this out for yourself.

Stephen C
A: 

If I understand your question correctly you want to have a matching/filtering library for an arbitrary regular language over an alphabet with dynamic types? Going with your car example, I'd imagine you'd want to be able to create an expression in order to match over a List where all Cars (have the color red, have between 2 and 6 Passengers and each Passenger is between 8 and 88 years of age) or (have 1 Passenger).

Coincidentally I've been looking for something like that myself (for document validation) and the closest I could get was Jing; A Java RELAX-NG library. Unfortunately, the alphabet in Jing consists out of XML nodes so it didn't solve my problem. At the moment I'm attempting to write a library myself which does just this (matching against regular languages over an arbitrary type of alphabet), based on the pattern matching in Jing. If you like to help with this, please let me know ;).

taitale
@taitale : I am not sure if I got your explanation correct. Actually the document contains only the word "car". But There are objects associated with it called Annotations. So car is annotated as "Vehicle"So I typically search for the annotation type "vehicle" which has the value "car". This entity is the first of the many entities to follow, which makes a regex with multiple values. By Multiple value I mean, like the car being a type Vehicle.So I might search for something like "<vehicle,car> sold". This tells the total no of cars sold in the doc. Is this what you were talking about ?
bsoundra
@bsoundra : I actually wasn't talking in terms of text, but more in terms of objects. If you are searching in text, then it is indeed something different ;). Perhaps you could elaborate more in depth on your use case?
taitale
My input would be something like "Porche is sold". This word Porche might be tagged as "car" or "Vehicle" or ... many other tags. These information is stored in some other object associated with the file. So if I search for the "<Vehicle,Porche> is sold" then it should find the match. I might also search for "<vehicle> is sold" which should list all the vehicles which are sold.
bsoundra