



Hello guys, I'm not very familiar with regEx's and I'm trying to find a preg_match regex for searching for any of the following strings within a file and if found it will halt it. I already have the fopen and fgets and fclose setup I just need to use a regex inside of a preg_match for the following php tags:

<?php <? ?>

so if preg_match returns 1 than this will skip this file and not upload it. I am using the $_FILES array to upload it via post, so I'm hoping I can use the $_FILES['file']['tmp_name'] variable for this to read through the file.

Thanks for your help with this :)


if (in_array('application/x-httpd-php', $files[$filid]['mimetypes']) && ($_FILES[$value]['type'][$n] == 'application/octet-stream' || $_FILES[$value]['type'][$n] == 'application/octetstream'))
    $file_extension = strtolower(substr(strrchr($_FILES[$value]['name'][$n], '.'), 1));

    if ($file_extension == 'php')
        // Reading the current php file to make sure it's a PHP File.
        $fo = fopen($_FILES[$value]['tmp_name'][$n], 'rb');
        while (!feof($fo))
            $fo_output = fgets($fo, 16384);

            // look for a match
            if (preg_match([REG EX HERE], $fo_output) == 1)
                $php = true;

OK, I apologize, but actually, what I am doing is I need to find a PREG MATCH. Because if it is a PHP FILE, I need to set the MIME TYPE to: application/x-httpd-php within the database. BECAUSE I'm also allowing PHP Files to be uploaded as well in certain instances. So hopefully the code I posted above makes more sense to you all now.

Can someone please help me with a preg_match regex for this please?

+1  A: 

(15 chars)

No need to test for `<?php` as `<?php` will already be matched by `<?`.
Ok, so how would I change this for only `<?` and `?>` strings than?
You would simply remove the center block, leaving you with /(?:<\?|\?>)/
True, that was silly
Matches `<?xml` as well..
@Matt: yes, but that's what the OP asked
@kemp - strictly speaking, OP asked to only match PHP tags
Ouch, yeah, is there a way to match PHP Tags only? So really, it should check for a space at the end of this or make sure that this is the only string in this... So we should be able to derive a solution to this, as it needs to match `<?`, `<?php`, or `?>` ONLY.
So, I suppose you'll need to match `<?php` directly than as well as all of the other possible php tags, guess that's the only way. But how?
You can discard the `xml` possibilty with a negative lookahead (edited now). If you look for PHP stuff you also have to consider `<?=`
So there's no way to make a regex for a perfect match of `<?` or `?>` or `<?php` ONLY?
Well ONLY `<?` is tricky, as it depends on what follows it. `<?$var` is PHP, `<?xml` is XML.
Sorry, after testing this approach even further, I see it doesn't work after all. Thanks anyways, but I have decided to use another approach... arggg
+2  A: 

If you want to parse the file, try the following instead:

function containsPhp($file) {
    if(!$content = file_get_contents($file)) {
        trigger_error('Not a file');
        return false;
    foreach(token_get_all($content) as $token) {
        if(is_array($token) && in_array(current($token), array(T_OPEN_TAG, T_OPEN_TAG_WITH_ECHO))) {
            return true;                
    return false;

... besides checking for a php extension (php, php5, phtml, inc etc).

Ok, I don't know what's going on here, but I just uploaded an XML file using this approach after changing the file extension from .xml to .php. So this doesn't work at all.
And when I upload other files, for example .zip files, I get this error about 100 times at the top of my page: `Warning: Unexpected character in input: '' (ASCII=5) state=1 in` and the ASCII changes to equal different values for each error, but all errors point to this line: `foreach(token_get_all($content) as $token) {`
This approach sucks. The fact that it got 2 up votes was incredible and a poor vote from the people who did this. No offense, but like I said, changing the filename from .xml to .php returns this as being TRUE! I've never seen 1 instance where using the token_get_all() actually returned positive results.
My testcase was based on 3 .php-files, one which included `<?= 'hello' `, one with `<?php echo 'hello` and one with `hello`, they all returned correct values. If your xml file included `<?` then it's probably matched against an opening php-tag (which you'd hit with your regexp anyways). You could probably add more xml/zip/etc tests to blacklist.
My XML file had <?xml in it only.
For example: `<?xml version="1.0" encoding="utf-8" ?>` That's it, and this returned TRUE!
I really really would like to use this approach, but it's a shame it doesn't work. arggg. Thanks anyways.
It will actually match anything containing those tags so it's just for whitelisting php and not blacklist anything else. Sorry bro. See
I don't even know what whitelisting and/or blacklisting even means... lol
Whitelisting = disallow everything but X, Y and Z; blacklisting = allow everything but X, Y Z. Think of spam filters with emails from your contacts (whitelisted) vs emails containing bad words (blacklisted). Or file inclusion: only allow your specified actions (whitelisting), vs disallowing every dangerous action (blacklisting, very hard to hit all those cases so whitelisting is a better choice here).
+1  A: 