views:

292

answers:

2

I have a system where I query a REST / Atom server for documents. The queries are inspired by GData and look like :

http://server/base/feeds/documents?bq=[type in {'news'}]

I have to parse the "bq" parameter to know which type of documents will be returned without actually doing the query. So for example,

bq=[type = 'news']                      ->  return ["news"]
bq=[type in {'news'}]                   ->  return ["news"]
bq=[type in {'news', 'article'}]        ->  return ["news", "article"]
bq=[type = 'news']|[type = 'article']   ->  return ["news", "article"]
bq=[type = 'news']|[title = 'My Title'] ->  return ["news"]

Basically, the query language is a list of predicate that can be combined with OR ("|") or AND (no separator). Each predicate is constraint on a field. The constraint can be =, <, >, <=, >=, in, etc... There can be spaces everywhere where it make sense.

I'm a bit lost between Regexp, StringTokenizer, StreamTokenizer, etc... and I am stuck with Java 1.4, so no Parser ...

Who can point me in the right direction ?

Thanks !

+3  A: 

The right way would be to use parser generator like Antlr, JFlex or JavaCC.

A quick and dirty way would be:

String[] disjunctedPredicateGroups = query.split("\|");
List<String[]> normalizedPredicates = ArrayList<String[]>;
for (String conjunction : disjunctedPredicateGroups ) {
   normalizedPredicates.add(conjunction.split("\[|\]"));
}
// process each predicate
ddimitrov
A: 

There already exist a number of query languages for querying documents, and if at all possible, it would make sense to use one of these, as you'll get the query parser "for free" and using existing standards is generally preferable to rolling your own.

One such query language is CQL. a Java library for parsing CQL queries is available here.

Don
Notice that we do use an existing standard ...
Guillaume