tags:

views:

69

answers:

4

I am a PHP beginner and saw on the forum this PHP expression:

My PHP version is 5.2.X ()

$regex = <<<'END'
/
  ( [\x00-\x7F]                 # single-byte sequences   0xxxxxxx
  | [\xC0-\xDF][\x80-\xBF]      # double-byte sequences   110xxxxx 10xxxxxx
  | [\xE0-\xEF][\x80-\xBF]{2}   # triple-byte sequences   1110xxxx 10xxxxxx * 2
  | [\xF0-\xF7][\x80-\xBF]{3}   # quadruple-byte sequence 11110xxx 10xxxxxx * 3 
  )
| ( [\x80-\xBF] )               # invalid byte in range 10000000 - 10111111
| ( [\xC0-\xFF] )               # invalid byte in range 11000000 - 11111111
/x
END;

Is this code correct? What do these strange (for me) constructions like <<<, 'END', /, /x, and END; mean?

My PHP version does not support nowdoc, how should I replace this expression? without quotes 'END' $regex became NULL

I recieve:

Parse error: syntax error, unexpected T_SL in /home/vhosts/mysite.com/public_html/mypage.php on line X

Thanks

+2  A: 

It's heredoc syntax.

The <<< 'END' says that it's the start of a string and that everything until the next appearance of "END" will be part of the string (even newlines).

The / and /x are actually part of the regex.

Michael Myers
+4  A: 

<<< and END are called heredoc syntax - a way of quoting a large amount of data to a variable.

$mytext = <<<TXT

this is my text and it
can be many lines
etc
etc

TXT;

The three characters (here TXT, END in your example) can be whatever you like although they must be alphanumeric as far as I'm aware.

Read more at http://uk2.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc

adam
+2  A: 

In addition to what other users have said about it being heredoc syntax (typically used for large strings that would otherwise require a lot of escaping), the code is defining a regular expression using "/" as the deliminator.

the "/x" at the end is closing the regular expression and then telling the regex engine to execute it in "free-spacing mode". Other possible options would have been /i for case-insensitive or /m for multi-line mode.

You can read more about PHP's regex engine here:

Using Regular Expressions in PHP

Austin Fitzpatrick
heredoc, newdoc... What difference between them.. `'END'` or just `END`?
serhio
$vars within heredocs are expanded as if the string were in "double quotes". $ in nowdocs is treated like $ in 'single quotes'. (Or vice versa, I haven't looked at 5.3 recently.)
jmucchiello
my php version does not support nowdoc, how should I replace this expression? without quotes '' $regex became NULL
serhio
+6  A: 

Parse error: syntax error, unexpected T_SL in /home/vhosts/mysite.com/public_html/mypage.php on line X

This comes from the 's around END. This is called nowdoc, which was added in PHP 5.3. Since you're using PHP 5.2, and this regex uses '\x', you'll need a quoted string or you'll need to escape the '\'s.

An example of the regex as a quoted string:

$regex = '/
( [\x00-\x7F]                 # single-byte sequences   0xxxxxxx
  | [\xC0-\xDF][\x80-\xBF]      # double-byte sequences   110xxxxx 10xxxxxx
  | [\xE0-\xEF][\x80-\xBF]{2}   # triple-byte sequences   1110xxxx 10xxxxxx * 2
  | [\xF0-\xF7][\x80-\xBF]{3}   # quadruple-byte sequence 11110xxx 10xxxxxx * 3
  )
| ( [\x80-\xBF] )               # invalid byte in range 10000000 - 10111111
| ( [\xC0-\xFF] )               # invalid byte in range 11000000 - 11111111
/x
';

The "/" and "/x" portions are control characters in the regex. The "/"s mark the beginning and end, and the meaning of the x flag (PCRE_EXTENDED) is defined in: http://us.php.net/manual/en/reference.pcre.pattern.modifiers.php

ddrown
+1 for recognizing the difference between nowdoc and heredoc!
notJim