Is newline and whitespaces the same in compiler design? what if you write a macro to replace newlines by whitespaces, is this correct or would it cause some form of problems?
You have to distinguish between important whitespace (eg inside quotes) and irrelevent whitespace between commands or statements.
It really depends on what the syntax of the language is. If the language itself is sensitive about this kind of characters (think Python) then replacing would cause problems. If not I do not foresee any problems. In most cases you can strip whitespace (outside of strings).
It depends on language's grammar. Some languages (e.g. Python) use newlines as statement terminator. And there are few languages very sensitive to the code layout (e.g. Haskell, although it allows non-layouted syntax, too).
The importance of whitespace is entirely syntax-dependent. See the following programming language: Whitespace
It is a very silly language, but it makes a great point.
Depends on the language. There are, and have been, all sorts of ways whitespace is treated. (I'm not talking about whitespace in quoted strings or anything like that, only making up statements.)
In C and C++, preprocessor directives end with a newline, but other than that whitespace is whitespace.
In old-fashioned FORTRAN, a statement would have to be in certain columns (7-72), and end-of-card would end a statement unless there was a continuation character in column 6 of the next card, but whitespace in columns 7-72 was completely optional. This made parsing difficult, since DO 10 I = 1, 10
was a loop statement, and DO 10 I = 1. 10
was an assignment of the value 1.1 to variable DO10I
.
Similarly, in the BASIC I used in my first home computers, a newline was the only significant space, and that required a new line number and statement.
In Python, whitespace is used to end statements, and also to define statement grouping.
In many languages, the nature of whitespace is insignificant, but it is necessary to have whitespace between language tokens and not within them.
So, the answer is "it depends", and there are no fundamental principles of compiler theory (except that requiring the use of whitespace greatly simplifies lexical analysis).
Some language don't care about newlines and keep reading until they hit, usually, a ';'
, while others really sharply end statements at a newline, and typically have a continuation character, usually, '\'
.
There are a few that are in-between, most notably Ruby. In Ruby, a newline usually ends the statement, but the parser can usually figure out if it needs to read more lines. Lines ending in binary operators, open parens, and other things like that do not terminate statements.
And we should probably also mention Python, which has the extremely cool property of expressing block delimiters by level of indent.
Spaces and newlines are certainly different things to the lexical analyzer of your compiler.
Whether or not it ignores them or converts them into tokens depends on the syntax of your language, of course.
I think this has been pretty well-answered by now, however I want to add that there is value in treating newlines separately just so that you have the means of tracking line-numbers, which people generally expect in error output. Some lexer-generators might doing this for you, but some don't. Other than that, it really depends on the language in question as to whether there needs to be any particular distinction.