tags:

views:

83

answers:

2

Hi all im currently writing a "display php code" function (output can be seen at http://www.actwebdesigns.co.uk/web-design-mansfield/php-functions/display-code-function.php)

Im having trouble with the color scheme which is done by regular expression. The 2 in particular are:

strings:

$line = preg_replace("#(\s|\()(\"[^\"]*\")(\,|\))#is", "\\1<span class=\"string\">\\2</span>\\3", $line);

(and trying)

#\"((?!(?:\"\s*;)|(?:\"\s*,)).)*#is

and functions:

$line = preg_replace("#(\s*)(@?|!?[a-z]+(?:[a-z]|[0-9]|_)*)(\s*)\(([^\)]*)\)#is", "\\1<span class=\"function\">\\2\\3</span>(\\4)", $line);

(if a function is inside a function it does not change color. Any hints would be much appreciated.

A: 

Hi,

why so complicated? Use hightlight_string(). ...and output buffering and ini_set(), if you need to change its output.

Boldewyn
doesn't really seem to do what im after... doesn't seem to highlight functions an so on and the output is all over the place.
Phil Jackson
A: 

About your string regex: you say it is a string if and only if it is preceded by a white space character or a ( and it is directly followed by a , or ). Needles to say, that is not correct. You'd miss strings like:

$s = "123";     // ends with a ;
$s = "ab\"cd";  // contains an escaped double quote
$t = 'efg' ;    // is surrounded by single quotes

to name just three (there are many more, and what about 'here-docs'?).

To account fix the cases above, try something like this:

$line = 's = "123"; t = "ab\\\\\\"cd"; u = \'efg\' ; v = \'ef\\\'g\' ';
echo $line . "\n";
echo preg_replace('/((["\'])(?:\\\\.|(?:(?!\2).|[^\\\\"\'\r\n]))*\2)/', '<span class="string">$1</span>', $line);
/* output:
s = "123"; t = "ab\\\"cd"; u = 'efg' ; v = 'ef\'g'
s = <span class="string">"123"</span>; t = <span class="string">"ab\\\"cd"</span>; u = <span class="string">'efg'</span> ; v = <span class="string">'ef\'g'</span>
*/

A short explanation:

(                        # start group 1
  (["\'])                #   match a single- or double quote and store it in group 2
  (?:                    #     start non-matching group 1
    \\\\.                #     match a double quote followed by any character (except line breaks)
    |                    #     OR
    (?:                  #     start non-matching group 2
      (?!\2).            #       a character other than what is captured in group 2
      |                  #       OR
      [^\\\\"\'\r\n]     #       any character except a backslash, double quote, single quote or line breaks
    )                    #     end non-matching group 2
  )*                     #   end non-matching group 1 and match it zero or more times
  \2                     #   the quote captured in group 2
)                        # end group 1

Then some comments about your second regex: you first try to match zero or more white space characters. This can safely be omitted because if no white spaces exist you'd still have a match. You could use a \b (word boundary) before matching the function name. Also, (?:[a-z]|[0-9]|_) can be replaced by (?:[a-z0-9_]). And this part of your regex: (@?|!?[a-z]+(?:[a-z]|[0-9]|_)*) which is the same as:

(
  @?
  |
  !?
  [a-z]+
  (?:
    [a-z]
    |
    [0-9]
    |
    _
  )*
)

only better indented to see what it actually does. If you look closely, you will see that it will match just @?, and since the @ is made optional by the ?, that part of your regex will match an empty string as well. No what you'd expected, eh? After that, I must confess I stopped looking at that regex any more, better throw it away.

Try something like this to match function names:

'/\b[a-z_][a-z0-9_]*(?=\s*\()/i'

Which means:

\b           # a word boundary (the space between \w and \W)
[a-z_]       # a letter or an underscore
[a-z0-9_]*   # a letter, digit or an underscore, zero or more times
(?=          # start positive look ahead
  \s*        #   zero ore more white space characters
  \(         #   an opening parenthesis
)            # end positive look ahead

This last one is not tested at all, I leave that for you. Also note that I know very little PHP, so I may be over-simplifying it, in which case it would help if you provide a couple of example code snippets you want to match as functions.

Furthermore a word of caution, parsing code using regex-es can be tricky, but if you're only using it to perform highlighting of small snippets of code, you should be fine. When the source files get larger, you might see a drop in performance and you should make some parts of your regex-es "possessive" which will increase the runtime of your matching considerately (especially on larger source files).

Lastly, you're probably reinventing the wheel. There exist numerous (well tested) code-highlighters you can use. I suspect you already know this, but I thought it would still be worth mentioning.

FYI, I've had good experience with this one: http://shjs.sourceforge.net/doc/documentation.html

Good luck!

Bart Kiers