tags:

views:

69

answers:

2

Hi, I am a newbie of compilers, but i got a project SQL engine - for only select statement. For this i have to use only hand written parser and engine. I studied the samples of LL(k) grammar and recursive descent techniques(suggested by stackoverflow for writing parser by hand). But in any of the samples, didn't find the way to construct the parse tree from functions.
Can any one of you tell me, how to do the whole compilation process step by step, by just taking "Select columnname1,columnname2 from table" example. And one more thing - boost libraries are also not allowed. Data is in memory. I used structures to store the data.Thanks in advance.

A: 

I would say the easier way is to deal with this as a compiler would:

  • Tokenization
  • Parsing (creation of the AST)

Tokenization is about identifying "words", for your example this gives:

"Select", "columnname1", ",", "columnanme2", "from", "table"

Parsing is about interpreting this list of tokens into the Abstract Syntax Tree. Given your requirements:

  • First token should be "select" (case-insensitive)
  • It may be followed by either "*" or a comma-separated list of column names
  • "from" (case-insensitive) marks the end of the list
  • it's followed by the table name

I'm going to outline this in Python

class Select:
  def __init__(self):
    self.columnList = None
    self.tableName = None

def Tokenize(statement):
  result = []
  for word in statement.split(" "):
    sub = word.split(",")             # the comma may not be separated by space
    for i in range(len(sub)-1):
      result.append(sub[i].lower())   # case insensitive
      result.append(",")
    result.append(sub[-1])
  return result

def CreateSelect(tokens):
  if len(tokens) == 0: raise NoToken()
  if tokens[0] != "select": raise NotASelectStatement()

  select = Select()

  i = 1
  while tokens[i] != "from":
    if tokens[i] == "*":
      select.columnList == ALL
      break
    else:
      select.columnList.append(tokens[i])
      i = i + 1
      if tokens[i] != ",": raise ExpectedComma(i)
      i = i + 1

   select.tableName = tokens[i+1]

Of course, as you realize, this is a limited example:

  • It does not take multiple tables into account
  • It does not take aliasing into account
  • It does not take nested select statements into account

However it works pretty well and is quite efficient. It could, normally, be even more efficient if we combined the tokenization and parsing phases, but I have found in general that it only made the code much harder to read.

Also note that for proper error reporting it would be better:

  • not to alter the tokens (.lower()) but simply to use case-insensitive comparison
  • to adjoin proper source location to each token (line/column) so as to be able to point the user to the right place

EDIT:

What an AST would look like with a less contrived example ?

"select foo, bar from FooBar where category = 'example' and id < 500;"

Let's go:

Select
|_ Columns (foo, bar)
|_ Table (FooBar)
\_ Where
   \_ And
      |_ Equal (category, 'example')
      \_ LessThan (id, 500)

Here you have a tree-like structure, which is what you want to produce.

Matthieu M.
yup... thanks for the reply. yes i need to do code all the functions that what select statement is doing in commercial databases. From your code i am not able to find -how to build AST. If possible can you describe me the code generation step also. Thanks for the time.
jony
Well, for your simplified example the `Select` class is the only class of the AST, thus the `CreateSelect` method is the AST generation. In a less contrived example, you would also have a `Where` class, an `OrderBy` class, a `Equal` class, etc...
Matthieu M.
ohh.. ok...thanks.. but is this tree used for input for code generation step?
jony
A: 

You might also find the BNF grammars for SQL 92, 99 and 2003 online which might come handy.

These are complete grammars, but it should not be to hard to isolate only the SELECT branch.

Also, you might explore already available grammars at http://www.antlr.org/grammar/list; there are a few SQL flavours there.

Unreason
Thanks for reply.. but i already seen them.
jony