tags:

views:

2239

answers:

9

Hello everyone,

I have a large collection of php files written over the years and I need to properly replace all the short open tags into proper explicit open tags.

change "<?" into "<?php"

I think this regular expression will properly select them :

<\?(\s|\n|\t|[^a-zA-Z])

which takes care of cases like

<?//
<?/*

but I am not sure how to process a whole folder tree + detect the .php file extension + apply the regular expression + save the file it it has been changed.

I have the feeling this can be pretty straightforward if you master the right tools. (There is an interesting hack in the sed manual : 4.3 Example/Lowercase to Uppercase)
Maybe I'm wrong.
Or maybe this could be a one liner?

Thank you for your help.

+2  A: 

My previous answer I just overwrote with sed wont work, sed is too weak for this sort of thing IMO.

So I've whipped up a perl-script that should do the trick, its hopefully very user-editable.

#!/usr/bin/perl 

use strict;
use warnings;

use File::Find::Rule;
use Carp;

my @files = File::Find::Rule->file()->name('*.php')->in('/tmp/foo/bar');

for my $file (@files) {
    rename $file, $file . '.orig';
    open my $output, '>', $file or Carp::croak("Write Error with $file $! $@ ");
    open my $input, '<', $file . '.orig'
      or Carp::croak("Read error with $file.orig $! $@");

    while ( my $line = <$input> ) {
        # Replace <?= with <?php echo 
        $line =~ s/<\?=/<?php echo /g;

        # Replace <? ashded  with <?php ashed

        $line =~ s/<\?(?!php|xml)/<?php /g;
        print $output $line;
    }

    close $input  or Carp::carp(" Close error with $file.orig, $! $@");
    close $output or Carp::carp(" Close error with $file  , $! $@");

    unlink $file . '.orig';
}

But note, I haven't tested this on any real code, so It could go "Bang" .

I would recommend you have your code revisioned ( wait, its already revisioned, right? .. right? ) and run your test-suite ( Don't tell me you don't have tests ! ) on the modified code, because you can't be certain its doing the right thing without a fully fledged FSM parser.

Kent Fredric
uhm, will this not make my site fuzzy if i have inline codes like <?=$printme;?> ???
lock
Thank you Kent Fredric, this gives me an idea on how to link find results and a sed command. But I'm afraid we're not there yet.
Polypheme
You could probably get away with using the `glob()` function instead of using File::Find::Rule. It should do the same thing in less space.
Chris Lutz
I could use glob, but glob can do strange things when file names have spaces in them. the recommendation these days for Modern Perl is to use File::Find::Rule as far as I can make ok, its concise and meaningful.( Not to mention filters out directories ;) )
Kent Fredric
[ P.s. I tried to apply my knowledge here of good practices to make the code "good" and "easy to understand" and "Easy to maintain", as opposed to golfing it : ) ]
Kent Fredric
I can't imagine a lot of directories ending with ".php". I don't usually name files (especially programming ones) with spaces, but I can see how that is a valid concern.
Chris Lutz
Yes this new script seems better, although it converts <?xml to <?phpxml, so I still prefer my regular expression : $line =~ s/<\?(\s|\n|\t|[^a-zA-Z])/<?php$1/g;
Polypheme
won't work for '<?' inside strings, etc - see my answer.
ax
Works well now! Thanks!
Polypheme
ax has a point though.
Polypheme
Well yes, I was stating that with the "you cant be certain its doing the right thing without an FSM parser" . I'mm fully aware of the string difficulties.
Kent Fredric
also, my case has been adapted so it will handle xml strings inside php assuming they are contiguous.
Kent Fredric
So far my tests have been positive with your script. Even in tricky cases.
Polypheme
A: 
Paulo
If going back is a real pain, you need proper version control. It shouldn't be.
derobert
That is what I did, commited to git then I'd like to use a script. (Kate did the job as you said, it supports "find in files" and then regex replacements one file at a time)
Polypheme
@derobert - True. At that time I didn't have any. I now have a proper SVN addiction.
Paulo
+13  A: 

don't use regexps for parsing formal languages - you'll always run into haystacks you did not anticipate. like:

<?
$bla = '?> now what? <?';

it's safer to use a processor that knows about the structure of the language. for html, that would be a xml processor; for php, the built-in tokenizer extension. it has the T_OPEN_TAG parser token, which matches <?php, <? or <%, and T_OPEN_TAG_WITH_ECHO, which matches <?= or <%=. to replace all short open tags, you find all these tokens and replace T_OPEN_TAG with <?php and T_OPEN_TAG_WITH_ECHO with <?php echo .

the implementation is left as an exercise for the reader :)

EDIT 1: ringmaster was so kind to provide one.

EDIT 2: on systems with short_open_tag turned off in php.ini, <?, <%, and <?= won't be recognized by a replacement script. to make the script work on such systems, enable short_open_tag via command line option:

php -d short_open_tag=On short_open_tag_replacement_script.php

p.s. the man page for token_get_all() and googleing for creative combinations of tokenizer, *token_get_all*, and the parser token names might help.

p.p.s. see also Regex to parse define() contents, possible? here on SO

ax
You definitely have a point here. I need to check this out. (Although I must say I hardly imagine where I would want to echo "<?" without appending "php" anyway)
Polypheme
Excellent suggestion.
Bob Somers
@Polypheme: <?xml version="1.0" encoding="UTF-8" ?>
Piskvor
@Piskvor: yes you are right, but I take care of this in the regex I use. Ok my remark between brackets was just a guess. And I guess that in approximately 100% of *my* cases the string situation would not be a problem. The "tokens route" is still better/cleaner though.
Polypheme
While this process is possibly a better alternative to using regex, note that it does not work on the system with short tags turned off, since the tokenizer obeys the settings in the php.ini for short tags.
ringmaster
@ringmaster: with short tags off in php.ini, you just do a `ini_set('short_open_tag', 1)` before calling the tokenizer, can't you?
ax
That doesn't work on PHP 5.3, no.
ringmaster
@ringmaster: you are right: it doesn't work w/ `ini_set`. thinking about it, this makes sense, as this setting is used early, in the parsing phase, before any code, including `ini_set`, has ever been executed. i thought it would work because it is documented as `PHP_INI_ALL` - this apparently is a (documentation) bug.there is a way around this, though: just set `short_open_tag` to On/1 via command line option, like so: php -d short_open_tag=On test.phpthen it *is* applied before the parsing state, and your tag replacement script works on systems with short tags turned off, too.
ax
+1  A: 
Dan Fego
Thank you for your clear explanations, I will definitely have a use for this on a few other problems I have!
Polypheme
A: 

It's typical for XML/XHTML pages to include following code:

<?php echo '<?xml version="1.0" encoding="UTF-8" ?>'; ?>

Of course that should not be changed neither to:

<?phpphp echo '<?phpxml version="1.0" encoding="UTF-8" ?>'; ?>

nor:

<?php echo '<?phpxml version="1.0" encoding="UTF-8" ?>'; ?>
vartec
Of course. The regexp I proposed in my question takes care of this. Kent Fedric also has a working regexp. And ax approach should be fine with it too.
Polypheme
+3  A: 

If you're using the tokenizer option, this might be helpful:

$content = file_get_contents($file);
$tokens = token_get_all($content);
$output = '';

foreach($tokens as $token) {
 if(is_array($token)) {
  list($index, $code, $line) = $token;
  switch($index) {
   case T_OPEN_TAG_WITH_ECHO:
    $output .= '<?php echo ';
    break;
   case T_OPEN_TAG:
    $output .= '<?php ';
    break;
   default:
    $output .= $code;
    break;
  }

 }
 else {
  $output .= $token;
 }
}
return $output;

Note that the tokenizer will not properly tokenize short tags if short tags aren't enabled. That is, you can't run this code on the system where short tags aren't working. You must run it elsewhere to convert the code.

ringmaster
see my last comment on http://stackoverflow.com/questions/684587/batch-script-to-replace-php-short-open-tags-with-php/684752#684752 for how to make it work on systems with `short_open_tag = Off`, too.
ax
A: 

That's my version of the RegExp:

<\?(?!(php|=|xml))(\s|\t|\n)
TiuTalk
A: 

solution on Python

http://ciupu.elektroda.eu/?p=26

Popo
A: 

This is a utility I wrote that converts PHP source that contains short open tags and replaces them with long tags.

i.e. it converts code like this:

  <?= $var1 ?>
  <? printf("%u changes\n",$changes) ?>

To this

  <?php echo $var1 ?>
  <?php printf("%u changes\n",$changes) ?>

The --skip-echo-tags option will cause it to skip <?= tags and only replace <? tags.

It’s written as a PHP-CLI script and needs the CLI php.ini file to be set to permit short short open tags. That’s the default setting for PHP 5.3.0 and earlier, but it might not always remain so. (The script simply won’t change anything if the setting isn’t enabled.)

danorton