Generic Java pattern recognition library - like regexps for List<Object>

views:

575

answers:

+3 Q:

Generic Java pattern recognition library - like regexps for List<Object>

Is there a library that would let me write regexp-like queries for lists of objects, just like java.util.regexp matches against strings, which are conceptually like lists of characters?

I want to be able to use patterns with greedy/conservative quantifiers, identifying groups in matches, etc. Obviously I would have to provide the code for matching a query token against an object in the list.

I'm not just trying to save time not writing my own query parser. I'm aware that regexp implementations (against strings) are a well researched area, and Sun's java.util.regex has surely had the hell optimized out of it long ago. Anything I write won't be nearly as efficient, and I might have to handle quite long lists (but pretty simple queries).

Thanks!

+1 A:

An idea, if the objects you store in the list are limited, is to create a Map between the object in the list and a Unicode character unique to that object. So given your list you will generate a string of characters that uniquely represent you list.

After that you can use the existing regex packages against the generated string and match anything you wish. After that you can map back to the List.

Savvas Dalkitsis 2009-08-08 22:36:10

That's a very interesting idea, thanks! It looks like it might well solve my problem.

2009-08-08 22:46:46

keep in mind that you must find a way to map each object to one unique character only. This means creating a hash - like function for the objects. Also the using 16 bit characters that means that you can have only 65535 unique objects in the list. That might be a limiting factor.

Savvas Dalkitsis 2009-08-08 22:53:42

For my actual usecase it's not a problem. If I were implementing the general case, I could map objects to unique sequences of two or three characters if needed. I would also use object identity mapping and avoid messing with hashing and equals().Thanks again!

2009-08-08 23:56:16

+2 A:

Something like Quaere. From their site:

Quaere is an open source, extensible framework that adds a querying syntax reminiscent of SQL to Java applications. Quaere allows you to filter, enumerate and create projections over a number of collections and other queryable resources using a common, expressive syntax.

rodrigoap 2009-08-08 22:42:19

Not quite what I need because SQL-like syntax doesn't capture order in the original collection (although it can order the selected subset). SQL has no equivalent for "select a-follows-b", the regexp "ba".

2009-08-08 22:52:49

It's an interesting idea, but the power of regex comes from generalizations you can make, and you'd need to define what it'd mean to generalize a search through a collection of objects.

I suppose it wouldn't be too difficult to put something together that, say, filtered a list of object by whether they contained property "X", or a property whose value is "Y" ( I think there's a simple version of this in apache commons, probably in the beanutils or commons-collections). But then how would you generalize? One obvious generalization might be regexes on searched for strings, but you probably would want something more like "contains property X or property Y*". I can also imagine searching by class type or properties of populations of subcollections.

As an abstract idea, it's really interesting. If you're trying to solve a specific development problem, however, you may be thinking of something more limited. What kinds of searches are you thinking of making?

Steve B. 2009-08-08 22:55:56

I was actually writing a bottom-up context free parser. It's fun (if inefficient) to do this by just specifying the grammar rules and letting the parser apply them opportunistically anywhere it can. But I lex'ed the source string to a tree (nested lists) of token objects, and then I wanted to use regexps.Now thanks to Sawas' idea, which I just implemented (only ~20 LOC), I can write: mapTokens(parser, "<ParenOpen>[^<ParenOpen><ParenClose>]*<ParenClose>", ParenGroup);and presto, there's my rule for combining ParenOpen....ParenClose lexical token groups into ParenGroup tokens.

2009-08-08 23:54:08

ansaurus

tags:

views:

answers:

Generic Java pattern recognition library - like regexps for List<Object>

related questions