views:

112

answers:

2

What is the difference between a top down and bottom up grammar? An example would be awesome.

+4  A: 

Afaik it doesn't make any difference for the grammar itself, but it does for the parser.

Wikipedia has a quite lengthy explanation of both bottom-up and top-down parsing.

Generally the (imho) more intuitive way is top-down. You start with the start symbol and apply the transformation rules that fit, while with bottom-up you need to apply transformation rules backwards (which usually created quite a headache for me).

Joey
+5  A: 

First of all, the grammar itself isn't top-down or bottom-up, the parser is (though there are grammars that can be parsed by one but not the other).

From a practical viewpoint, the main difference is that most hand-written parsers are top-down, while a much larger percentage of machine-generated parsers are bottom-up (though, of course, the reverse is certainly possible).

A top-down parser typically uses recursive descent, which typically means a structure something like this (using typical mathematical expressions as an example):

expression() { term() [-+] expression }
term() { factor() [*/] term() }
factor() { operand() | '(' expression() ')' }

A bottom-up parser work in the reverse direction -- where a recursive descent parser starts from the full expression, and breaks it down into smaller and smaller pieces until it reaches the level of individual tokens, a bottom-up parser starts from the individual tokens, and uses tables of rules about how those tokens fit together into higher and higher levels of the expression hierarchy until it reaches the top level (what's represented as "expression" above).

Edit: To clarify, perhaps it would make sense to add a really trivial parser. In this case, I'll just do the old classic of converting a simplified version of a typical mathematical expression from infix to postfix:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void expression(void);

void show(int ch) { 
    putchar(ch);
    putchar(' ');
}

int token() { 
    int ch;
    while (isspace(ch=getchar()))
        ;
    return ch;
}

void factor() { 
    int ch = token();
    if (ch == '(') {
        expression();
        ch = token();
        if (ch != ')') {
            fprintf(stderr, "Syntax error. Expected close paren, found: %c\n", ch);
            exit(EXIT_FAILURE);
        }
    }
    else
        show(ch);
}

void term() { 
    int ch;
    factor();
    ch = token();
    if (ch == '*' || ch == '/') {
        term();
        show(ch);
    }
    else
        ungetc(ch, stdin);
}

void expression() {
    int ch;
    term();
    ch = token();
    if (ch == '-' || ch=='+') {
        expression();
        show(ch);
    }
    else 
        ungetc(ch, stdin);
}

int main(int argc, char **argv) {
    expression();
    return 0;
}

Note that the lexing here is pretty stupid (it basically just accepts a single character as a token) and the expressions allowed are quite limited (only +-*/). OTOH, it's good enough to handle an input like:

1+2*(3+4*(5/6))

from which it does produce what I believe is correct output:

1 2 3 4 5 6 / * + * +

Jerry Coffin
+1. Nicely explained. It's been to long for me to actually do it in that detail ;-)
Joey
is `expression() { term() [-+] expression }`equivalent to: `expression -> term +|- expression`
sixtyfootersdude
@sityfootersdude: yes and no. The intent was to (in extreme shorthand) portray actual code. I.e., expression() would call term(), then look for a '+' or '-', then (probably) repeat a loop, looking for another expression.
Jerry Coffin
The example that you gave is an example of a grammar right? NOT of a parser.
sixtyfootersdude
@sityfootersdude: no, that was my point in my previous comment -- though certainly shorthand, it's of a parser, not just a grammar.
Jerry Coffin
Aww, ok I see thanks Jerry. (I didn't hit refresh before posting again).
sixtyfootersdude
Would you mind providing the same grammar from a bottom up perspective?
sixtyfootersdude
I wouldn't particularly mind, except for the minor detail that I've never actually written a bottom-up parser by hand, and machine generated parsers are nearly unreadable (there's no real intent that you every read what they generate).
Jerry Coffin
Cool, well thanks Jerry and have yourself a great night.
sixtyfootersdude
Are there really grammars that can only be parsed by a top-down or a bottom-up parser? Can you provide any examples?
Marius Gedminas
@Marius: yes and no. Technically, the restrictions I was thinking of are on a recursive-descent parser, not really top-down parsers in general. Going from memory, a normal recursive descent parser is limited to LR(0) grammars, where the normal bottom-up parser can handle LALR(1).
Jerry Coffin