ansaurus

Question

What's the best way to explain parsing to a new programmer?

Answer 1

+5 A:

Parsing is the process of analyzing text made of a sequence of tokens to determine its grammatical structure with respect to a given (more or less) formal grammar.

The parser then builds a data structure based on the tokens. This data structure can then be used by a compiler, interpreter or translator to create an executable program or library.

alt text

If I gave you an english sentence, and asked you to break down the sentence into its parts of speech (nouns, verbs, etc.), you would be parsing the sentence.

That's the simplest explanation of parsing I can think of.

That said, parsing is a non-trivial computational problem. You have to start with simple examples, and work your way up to the more complex.

Robert Harvey 2010-05-29 00:01:09

+1 also, "To split a file or other input into bits of data that can be easily stored or manipulated. " - http://en.wiktionary.org/wiki/parse

jball 2010-05-29 00:01:54

In my LC-3 class, we had to write a program that would have the user input an instruction name (ADD, LD, JMP etc) and it would return the opcode. He called this an op-code parser. He never really explained what parsing even was. This kind of makes sense though!

yuudachi 2010-05-29 00:20:37

@Robert: -1 I think your definition is incorrect. Parse is the process of defining if an input belongs to a specified language, the formal name for it is sintatical analysis.

Carlos Loth 2010-05-29 02:02:45

@Carlos: Er...What? See my edit for the *actual* definition.

Robert Harvey 2010-05-29 02:04:28

@Robert: Your first sentence is still incorrect, the process of extract meaning of a text is semantic analysis, not parsing.

Carlos Loth 2010-05-29 02:25:08

@Carlos: Have a look at the Wikipedia article I linked.

Robert Harvey 2010-05-29 02:38:49

@Robert: yes I had read it before you have updated your answer with the link. I continue thinking your first sentence is incorrect. Extract meaning from text is only the latest part of the compilation process, and its name is not parsing.

Carlos Loth 2010-05-29 02:50:26

@Carlos: I fixed the first sentence in my answer. Have a look again.

Robert Harvey 2010-05-29 04:11:09

@Robert: I've looked at it, however I think copy and paste wikipedia information doesn't add any value to your answer.

Carlos Loth 2010-05-29 04:19:30

Answer 2

+4 A:

Have them try to write a program that can evaluate arbitrary simple arithmetic expressions. This is a simple problem to understand but as you start getting deeper into it a lot of basic parsing starts to make sense.

Sijin 2010-05-29 00:01:37

Answer 3

+6 A:

I'd explain parsing as the process of turning some kind of data into another kind of data.

In practice, for me this is almost always turning a string, or binary data, into a data structure inside my Program.

For example, turning

":Nick!User@Host PRIVMSG #channel :Hello!"

into (C)

struct irc_line {
    char *nick;
    char *user;
    char *host;
    char *command;
    char **arguments;
    char *message;
} sample = { "Nick", "User", "Host", "PRIVMSG", { "#channel" }, "Hello!" }

LukeN 2010-05-29 00:04:38

+1 Interesting way of explaining a complicated term

David Relihan 2010-05-29 00:34:59

-1 You explained what a compilation process is. Parsing is a concept simpler than turning some kind of data into another.

Carlos Loth 2010-05-29 04:23:37

-1 Carlos is correct. Parsing is not turning data into anything else. Parsing is just the analysis of a sequence of characters (or tokens). Creating something from the analysis is a complete different thing.

jpbochi 2010-05-31 13:30:11

On some existential level, every program is about turning one kind of data into another kind of data (isn't that the definition of a function?). I think a clearer way of expressing it would be to say that parsing is the process of assigning *names* to bits of input. In your example, you are assigning the name `sample.message` to the characters `"Hello!"`. This is a necessary prerequisite to, but completely separate from, the task of assigning *meaning* to names -- e.g., what does `sample.message` mean, or what does it do? As Carlos points out, that becomes *semantic* analysis.

Daniel Pryden 2010-06-09 01:21:23

Answer 4

A:

Parsing is about READING data in one format so that you can use it to yours needs.

I think you need to teach them to think like this, so, this is the simplest way I can think of to explain parsing for someone new to this concept.

Generally, we try to parse data one line at a time because generally it is easier for humans to think this way, dividing and conquering, and also easier to code.

We call field to every minimun undivisible data. Name is field, Age is another field, and Surname is another field, for example.

In a line, we can have various fields. In order to distiguish them, we can delimite fields by separators or by the maximun length assign to each field.

For example: By separating fileds by comma

Paul,20,Jones

Or by space (Name can have 20 letters max, age up to 3 digits, Jones up to 20 letters)

Paul 020Jones

Any of the before set of fields is called a record.

To separate between a delimited field record we need to delimit record. A dot will be enough (though you know you can apply CR/LF).

A list could be:

Michael,39,Jordan.Shaquille,40,O'neal.Lebron,24,James.

or with CR/LF

Michael,39,Jordan
Shaquille,40,O'neal
Lebron,24,James

You can say them to list 10 nba (or nlf) players they like. Then, they should type them according to a format. Then make a program to parse it and display each record. One group, can make list in a comma-separated format and a program to parse a list in a fixed size format, and viceversa.

2010-05-29 00:42:00

-1 Converting data from one format to another is compilation.

Carlos Loth 2010-05-29 02:51:49

My answer is meant to be practical to beginners. To ease their learning. Later, they can tackle more aspects, like syntax analysis, parsing, grammars, etc...

2010-05-29 04:56:17

Parsing is not about converting anything. Parsing is just reading. Misuse of concepts will lead to more confusion later.

jpbochi 2010-05-31 13:25:38

Answer 5

+1 A:

What is parsing?

In computer science, parsing is the process of analysing text to determine if it belongs to a specific language or not (i.e. is syntactically valid for that language's grammar). It is an informal name for the syntactic analysis process.

For example, suppose the language a^n b^n (which means same number of characters A followed by the same number of characters B). A parser for that language would accept AABB input and reject the AAAB input. That is what a parser does.

In addition, during this process a data structure could be created for further processing. In my previous example, it could, for instance, to store the AA and BB in two separate stacks.

Anything that happens after it, like giving meaning to AA or BB, or transform it in something else, is not parsing. Giving meaning to parts of an input sequence of tokens is called semantic analysis.

What isn't parsing?

Parsing is not transform one thing into another. Transform A into B, is, in essence, what a compiler does. Compiling takes several steps, parsing is only one of them.
Parsing is not extract meaning from a text. Extract meaning from a text, is semantic analysis which is a step of the compiling process.

What is the simplest way to understand it?

I think the best way for understanding the parsing concept is to begin with the simpler concepts. The simplest one in language processing subject is the finite automaton. It is a formalism to parsing regular languages, such as regular expressions.

It is very simple, you have an input, a set of states and a set of transitions. Consider the following language built over the alphabet { A, B }, L = { w | w starts with 'AA' or 'BB' as substring }. The automaton below represents a possible parser for that language whose all valid words starts with 'AA' or 'BB'.

    A-->(q1)--A-->(qf)
   /  
 (q0)    
   \          
    B-->(q2)--B-->(qf)

It is a very simple parser for that language. You start at (q0), the initial state, then you read a symbol from the input, if it is A then you move to (q1) state, otherwise (it is a B, remember the remember the alphabet is only A and B) you move to (q2) state and so on. If you reach (qf) state, then the input was accepted.

As it is visual, you only need a pencil and a piece of paper to explain what a parser is to anyone, including a child. I think the simplicity is what makes the automata the most suitable way to teaching language processing concepts, such as parsing.

Finally, being a Computer Science student, you will study such concepts in-deep at theoretical computer science classes such as Formal Languages and Theory of Computation.

Carlos Loth 2010-05-29 01:58:45

Sorry, Carlos, but you got it wrong. Basically what you are saying is that parsing determines whether source code is C#, Visual Basic, or Java. That makes no sense, and it's not the purpose of parsing.

Robert Harvey 2010-05-29 03:02:13

I didn't say that. What I said is that a parsing is the process that determines if a program is sintatically valid (belongs to a language).

Carlos Loth 2010-05-29 03:12:03

You could at least spell "syntactically" correctly. The Wikipedia article captures the essence of what parsing is pretty well. You should read it.

Robert Harvey 2010-05-29 03:17:31

Thanks for pointing it out, I'm not a native english speaker. But you already noted it.

Carlos Loth 2010-05-29 04:02:49

@Robert: I've already read the article again (after our exchange of ideas) and I continue with the same opinion.

Carlos Loth 2010-05-29 04:11:13

@Robert: Thanks for cleaning it up.

Carlos Loth 2010-05-29 04:20:35

The answer is accurate. It might not be as simple as Daisetsu expected, though.

jpbochi 2010-05-31 13:23:59

ansaurus

tags:

views:

answers:

What's the best way to explain parsing to a new programmer?

What is parsing?

What isn't parsing?

What is the simplest way to understand it?

related questions