views:

113

answers:

2

I don't know if someone can help me, but i'll ask anyway. I'm creating a function like the php token_get_all written in javascript. This function should "tokenize" a given php code, but i have some problems with whitespaces.

Executing the token_get_all function in php i see that only some whitespaces are considered tokens, the other ones are ignored.

Can someone explain me how this function behaves with whitespaces? Have you ever found some documentation about it?

UPDATE

<?php
if ($var == 0)
{
?>
  • Beetween php and if: ignored
  • Beetween if and (: tokenized
  • Beetween $var and =: tokenized
  • Beetween = and 0: tokenized
  • Beetween ) and {: tokenized
  • Beetween { and ?>: tokenized
+1  A: 

Actually, it is never ignored. Zend lexer always returns whitespace, for highlighting/indenting purposes.

"<?php if" (one space) is two tokens: "<?php " -- note the space -- and "if") 
"<?php  if" (two spaces) is three tokens: "<?php ", T_WHITESPACE + "if"

example:

$t = token_get_all("<?php echo 1;?>");
echo token_name($t[1][0]); // T_ECHO

$t = token_get_all("<?php       echo 1;?>");
echo token_name($t[1][0]); // T_WHITESPACE
stereofrog
But if you try to do tokenize it between <?php and if the whitespace is not tokenized.
mck89
See my update..
mck89
A: 

I've found the solution. Generally whitespaces are ignored after the php open tags: <?php, <? but not <?=

UPDATE @stereofrog

It has taken 2 hours, but i've understood the behaviour:). <?php and <? get also the following space char or new line char (preceeded by \r or not). The rest of the whitespaces are parsed in other tokens but grouped if they follow the first whitespace. Let me explain better with your examples:

<?php echo "test"?>

Tokens: "<?php ","echo"....

<?php    echo "test"?>

Tokens: "<?php "," (remaining whitespaces)","echo"...

Another example with new lines:

<?php
echo "test"
?>

Tokens: "<?php\n","echo"....

<?php


echo "test"
?>

Tokens: "<?php\n","\n\n(remaining new lines)","echo"....

I've tested it all the day so i'm sure that it behaves like this.

mck89
yes, "<?php" consumes exactly one following whitespace.
stereofrog