views:

540

answers:

3

Hello friends,

I need to make a parser to be able to extract logical structure from a text input in order to construct a query for some web service.

I tried to use regular expressions but it gets really complicated to handle imbrication logic, so I decided to ask for help, maybe I am doing it the wrong way.

ex:

( (foo1 and bar) or (foo2 and bar2) ) and ( (foo3 and bar3) or foo4 ) and "this is quoted"

the result should be something like this:

{
    {
     foo1
     AND
     bar
    }
    OR
    {
     foo2
     AND
     bar2
    }
}
AND
{
    {
     foo3
     AND
     bar3
    }
    OR
    foo4
}
AND
{
    "this is quoted"
}

Language used is actionscript 3, but I could adapt Java version.

Any help on the subject is welcome :) Thank you in advance.

A: 

You need a parser; the easiest way is to reuse an existing one. I would suggest you try JSON since it's popular and quite close to what you're looking for.

A query like (a AND (b OR c)) might be encoded like that:

{ "left": "a",
  "op": "AND",
  "right": 
  {
    "left": "b",
    "op": "OR",
    "right": "c"
  }
}

After parsing you'll get an object with three fields, called left, op, and right. You should be able to build the query easily from there.

OK, this can only work if you can choose the format for your input. If you cannot, then you might have to write a parser yourself. You can probably use something like recursive descent since the syntax in your example is simple.

redtuna
I am not familiar with JSON, I googled for "JSON logical parser" with no luck. Can you please help me start parsing a string with JSON, I am willing to try ?
Adnan Doric
A: 

There are generic ways to define a parser. Take a look at Extended Backus–Naur Form

Wahnfrieden
a little overkill i think ... but yeah, correct answer ... :) ... i once wrote an AS3 parser, that'd take an EBNFish markup and an input stream and spit out a syntax tree ... but i spent more than a week i think ... this problem is far too simple for such heavy artillery ...
back2dos
I imagine it's something that takes a lot to learn, but once you're familiar with EBNF, it's rather easy for even small tasks.
Wahnfrieden
i agree EBNF is a wonderful thing, still you need 1) a parser that can handle the EBNF and 2) and EBNF describes syntax and no semantics per se, e.g. you cannot capture operator priority in EBNF (well, in theory you can, but you really don't want to, if there are many levels of priority), etc. etc.if you have a library that can cope with EBNF, it's a good starter. if you don't, you should consider straight forward parsing. using an expressive language the parsing implementation is not much more verbous than an EBNF+conversion to semantical expression tree would be.
back2dos
+2  A: 

well, the parser is quite simple ...

first you will need quite a lot of stuff (i'll omit constructors, since i guess you can write them on your own):

expressions (output):

class Expression {}
class Operation extends Expression {
 public var operand1:Expression;
 public var operator:String;
 public var operand2:Expression;
}
class Atom extends Expression {
 public var ident:String;
}

tokens (intermediary format):

class Token {
 public var source:String;
 public var pos:uint;
}
class Identiefier extends Token {
 public var ident:String;
}
class OpenParenthesis extends Token {}
class CloseParenthesis extends Token {}
class Operator extends Token {
 public var operator:String;
}
class Eof extends Token {}

and a tokenizer, that should implement this interface

interface TokenStream {
    function read():Token;
}

i guess you'll figure out yourself how to tokenize ...

so the way is source --(tokenizer)--> tokens --(parser)--> expressions ...

and here the parsing routine, with a little helper:

function parse(t:TokenStream):Expression {
 var tk:Token = t.read();
 switch ((tk as Object).constructor) {//this is a really weird thing about AS3 ... need to cast to object, before you can access the constructor
  case OpenParanthesis:
   var e1:Expression = parse(t);
   tk = t.read();
   switch ((tk as Object).constructor) {
    case CloseParenthesis:
     return e1;
    case Operator:
     var op:String = (tk as Operator).operator;
     var e2:Expression = parse(t);
     tk = t.read();
     if (tk is CloseParenthesis)
      return new Operation(e1,op,e2);
     else
      unexpected(tk);
   }
   else
    unexpected(tk);
  break;
  case Identifier:
   return new Atom((tk as Identifier).ident);
  default:
   unexpected(tk);
 }
}
function unexpected(tk:Token) {
 throw "unexpected token "+tk.source+" at position "+tk.pos;
}

this is not a particularly good parser, but it shows the bare fundamentals of parsing routines ... well, actually, i didn't check the implementation, but it should work ... it is very primitive and unpermissive ... things like operator precedence etc. are completely missing, and so on ... but if you want that, have a go at it ...

btw. using haXe with enums, the whole code would look much shorter and much more beautiful ... you may want to have a look at it ...

good luck then ... ;)

back2dos
Thank you for your help, I'll dig it.
Adnan Doric