views:

148

answers:

3

Hello guys, I'm not very familiar with regEx's and I'm trying to find a preg_match regex for searching for any of the following strings within a file and if found it will halt it. I already have the fopen and fgets and fclose setup I just need to use a regex inside of a preg_match for the following php tags:

<?php <? ?>

so if preg_match returns 1 than this will skip this file and not upload it. I am using the $_FILES array to upload it via post, so I'm hoping I can use the $_FILES['file']['tmp_name'] variable for this to read through the file.

Thanks for your help with this :)

EDIT

if (in_array('application/x-httpd-php', $files[$filid]['mimetypes']) && ($_FILES[$value]['type'][$n] == 'application/octet-stream' || $_FILES[$value]['type'][$n] == 'application/octetstream'))
{
    $file_extension = strtolower(substr(strrchr($_FILES[$value]['name'][$n], '.'), 1));

    if ($file_extension == 'php')
    {
        // Reading the current php file to make sure it's a PHP File.
        $fo = fopen($_FILES[$value]['tmp_name'][$n], 'rb');
        while (!feof($fo))
        {
            $fo_output = fgets($fo, 16384);

            // look for a match
            if (preg_match([REG EX HERE], $fo_output) == 1)
            {
                $php = true;
                break;
            }
        }
        fclose($fo);
    }
}

OK, I apologize, but actually, what I am doing is I need to find a PREG MATCH. Because if it is a PHP FILE, I need to set the MIME TYPE to: application/x-httpd-php within the database. BECAUSE I'm also allowing PHP Files to be uploaded as well in certain instances. So hopefully the code I posted above makes more sense to you all now.

Can someone please help me with a preg_match regex for this please?

+1  A: 
/(?:<\?(?!xml)|\?>)/

(15 chars)

kemp
No need to test for `<?php` as `<?php` will already be matched by `<?`.
Gumbo
Ok, so how would I change this for only `<?` and `?>` strings than?
SoLoGHoST
You would simply remove the center block, leaving you with /(?:<\?|\?>)/
ABach
True, that was silly
kemp
Matches `<?xml` as well..
Matt
@Matt: yes, but that's what the OP asked
kemp
@kemp - strictly speaking, OP asked to only match PHP tags
Matt
Ouch, yeah, is there a way to match PHP Tags only? So really, it should check for a space at the end of this or make sure that this is the only string in this... So we should be able to derive a solution to this, as it needs to match `<?`, `<?php`, or `?>` ONLY.
SoLoGHoST
So, I suppose you'll need to match `<?php` directly than as well as all of the other possible php tags, guess that's the only way. But how?
SoLoGHoST
You can discard the `xml` possibilty with a negative lookahead (edited now). If you look for PHP stuff you also have to consider `<?=`
kemp
So there's no way to make a regex for a perfect match of `<?` or `?>` or `<?php` ONLY?
SoLoGHoST
Well ONLY `<?` is tricky, as it depends on what follows it. `<?$var` is PHP, `<?xml` is XML.
kemp
Sorry, after testing this approach even further, I see it doesn't work after all. Thanks anyways, but I have decided to use another approach... arggg
SoLoGHoST
+2  A: 

If you want to parse the file, try the following instead:

function containsPhp($file) {
    if(!$content = file_get_contents($file)) {
        trigger_error('Not a file');
        return false;
    }
    foreach(token_get_all($content) as $token) {
        if(is_array($token) && in_array(current($token), array(T_OPEN_TAG, T_OPEN_TAG_WITH_ECHO))) {
            return true;                
        }
    }
    return false;
}

... besides checking for a php extension (php, php5, phtml, inc etc).

chelmertz
Ok, I don't know what's going on here, but I just uploaded an XML file using this approach after changing the file extension from .xml to .php. So this doesn't work at all.
SoLoGHoST
And when I upload other files, for example .zip files, I get this error about 100 times at the top of my page: `Warning: Unexpected character in input: '' (ASCII=5) state=1 in` and the ASCII changes to equal different values for each error, but all errors point to this line: `foreach(token_get_all($content) as $token) {`
SoLoGHoST
This approach sucks. The fact that it got 2 up votes was incredible and a poor vote from the people who did this. No offense, but like I said, changing the filename from .xml to .php returns this as being TRUE! I've never seen 1 instance where using the token_get_all() actually returned positive results.
SoLoGHoST
My testcase was based on 3 .php-files, one which included `<?= 'hello' `, one with `<?php echo 'hello` and one with `hello`, they all returned correct values. If your xml file included `<?` then it's probably matched against an opening php-tag (which you'd hit with your regexp anyways). You could probably add more xml/zip/etc tests to blacklist.
chelmertz
My XML file had <?xml in it only.
SoLoGHoST
For example: `<?xml version="1.0" encoding="utf-8" ?>` That's it, and this returned TRUE!
SoLoGHoST
I really really would like to use this approach, but it's a shame it doesn't work. arggg. Thanks anyways.
SoLoGHoST
It will actually match anything containing those tags so it's just for whitelisting php and not blacklist anything else. Sorry bro. See http://php.net/manual/en/tokens.php
chelmertz
I don't even know what whitelisting and/or blacklisting even means... lol
SoLoGHoST
Whitelisting = disallow everything but X, Y and Z; blacklisting = allow everything but X, Y Z. Think of spam filters with emails from your contacts (whitelisted) vs emails containing bad words (blacklisted). Or file inclusion: only allow your specified actions (whitelisting), vs disallowing every dangerous action (blacklisting, very hard to hit all those cases so whitelisting is a better choice here).
chelmertz
+1  A: 
\?>|<\?((?=php)|(?!\w))
ZyX