views:

188

answers:

7

The setup:

I have a standard .php file (index.php) that contains two includes, one for header (header.php) and one for footer (footer.php). The index.php file looks like this:

index.php

<?php
include header.php;
?>

<h2>Hello</h2>
<p class="editable">Lorem ipsum dolar doo dah day</p>

<?php
include footer.php;
?>

header.php like this:

<html>
<head>
<title>This is my page</title>
</head>
<body>
<h1 class="editable">My Website rocks</h1>

and footer .php like this:

<p>The end of my page</p>
</body>

I am writing a PHP script that allows you to edit any of the ".editable" items on a page. My problem is that these editable regions could appear in any included files as well as the main body of index.php.

My php code is grabbing the index.php file with file_get_contents(); which works well. I am also able to edit and save any ".editable" regions in index.php.

My issue:

I have been unable to find a way of "finding" the includes and parse through those for ".editable" regions as well. I am looking for suggestions on how I would work through all the includes in index.php - checking them for editable regions. Would I need to use regular expressions to find "include *.php"? I am unsure of where to even start...

For those of you who may wish to see my PHP code. I am making use of the PHP class: [link text][1] which allows me to write code like:

// load the class and file
$html = new simple_html_dom();
$html->load_file("index.php");

// find the first editable area and change its content to "edited"  
$html->find('*[class*=editable]', 0)->innertext = "Edited";

// save the file
$html->save(index.php);

[1]: http://simplehtmldom.sourceforge.net/manual_api.htm simple php dom parser


UPDATE

I have been playing around with regular expressions to try and match the includes. I am pretty rubbish at regex but I think I am getting close. Here is what I have so far:

$findinclude = '/(?:include|include_once|require|require_once)\s*(?:[a-z]|"|\(|\)|\'|_|\.|\s|\/)*(?=(?:[^\<\?]|[^\?\>])*\?>)/i';

This matches fairly well although it does seem to return the odd ) and ' when using preg_match. I am trying to add a bit of security into the regex to ensure it only matches between php tags - this part: (?=(?:[^\<\?]|[^\?>])*\?>) - but it only returns the first include on a page. Any tips on how to improve this regular expression? (I have been at it for about 6 hours)

+1  A: 

What type of system are you creating?

If it's going to be used by the public, you'd have serious security concerns. People could include their own PHP code or JavaScript in the supplied content.

This isn't the standard way at all to create dynamic content. For most purposes, you'd want to create a single template, and then allow users to save their changes into a database. You'd then fill in the info into the template from the database for display.

If you allow them to include HTML use something like html purifier to clean it up, insert the data into your database with a prepared statement using PDO. I'm sure people here would be happy to answer any questions you may have about using a database.

Alex JL
I am hoping it will be a light content management system. All system users/editors will need to be logged in to make changes and all saved HTML will be validated with something along the lines of htmlpurifier.org (thanks for the link). In this instance I want to not use a database for content.
Scott
I see. Sure, if all of the people editing are trusted then having them alter potentially executable files might be okay. Still, I'd suggest looking at the structural philosophy of templating and using a DB. I would store the editable content separately from the presentation part, whether in a DB or as a file. Then retrieve it in the script and display it if available, or display the default contents if there are no edits. This would eliminate the need to do what your original question asked, also.
Alex JL
A: 

If users can submit content into these and then they get included into a PHP file, then you are in some serious trouble.

You should have simple templates that have little or no PHP in them, which get parsed -- then and only then should you insert content into the DOM, after it has been properly sanitized.

The way to resolve your 'finding the includes' issue -- you don't need to, PHP does that for you -- maybe use ob_start et al. and then include the template file. Then grab the buffer contents (which will be HTML) and then parse the already assembled template with the DOM parser.

Please, please PLEASE make sure that you sanitize whatever you are injecting into the DOM.

Otherwise, tyranny and destruction are certain to rain down upon your web site (and you, depending on what else is on your server).

Carson Myers
Thanks for your feedback. All users will have to be logged in to edit the sections and all data will be heavily validated as well. Because the template files can contain php I cannot see any other way to do it - as I need to preserve this PHP and only manipulate the html in/around it. I have been playing around with output buffering and will continue to... so far it has not given me any clear way to make this work.
Scott
have you thought about storing the editable parts in a database? or at least a csv or something? So at least then you don't have to parse the DOM and try and parse php includes, you just have to do like `<php echo $editable['somesection']; ?>`
Carson Myers
+1  A: 

I've misunderstood you, disregard everything after the hr.

To do what you want I guess the simplest way is to present the page to the browser, build some kind of javascript that finds and edits editable areas and submit that to a PHP file via AJAX.

The PHP file would then receive the content and the place where it should change the content, I still don't understand very well how the static CMS do it, but there are some open source projects, check here and here. I suggest you study their code to find out how they do it.


That's really simple, instead of incluiding the file like this:

file_get_contents('/path/to/file.php');

You have to do it like this:

file_get_contents('http://your-host.com/path/to/file.php');

Also, take a look at QueryPath, seems to be a lot better than SimpleHTMLDom.

Alix Axel
Thanks, I had a look at the other CMSs you pointed me towards and they work a little bit differently. All good research though. QueryPath looks decent too.
Scott
@Scott: Are you sure? Have you seen **Orbis CMS** and **MechEdit**?
Alix Axel
@Alix Axel - MechEdit works with only HTML files - so it does not have to be respectful of PHP code like includes etc. Orbis stores its data separately from the template files and then uses PHP code to place the relevant content in the right place: e.g. "<?php echo(orbis_data([page name],[section name])); ?>" - both have similar models but are ever so slightly different.
Scott
@Scott: Nice, good luck with your project. Take a look at Unify CMS, it seems to do what you want.
Alix Axel
@Alix Aexl. Thanks very much. I am the developer of Pixie (www.getpixie.co.uk) and at the moment I am just exploring some ideas for the future of the project. Unify looks to be based on a very similar concept. I Will have to have a play!
Scott
@Scott: We could use a open-source alternative to Unify, good luck with your development and if you're successful drop me an email please, I would like to try it out! =)
Alix Axel
A: 

You need to just store the user-inputted text somewhere and load it into, and output it with, your PHP template.

I'd look into learning to use a database. There is nothing heavy-weight or slow about it, and really, this is what they're for. If you don't want to use a database, you can use files instead. I'd suggest storing the data in the file in JSON format to give it some structure.

Here's a very simple system to use files to store and retrieve JSON encoded data.

Make an array of what you want to save after editing

$user_data=array('title'=>$user_supplied_info,'content'=>$user_supplied_words);
$json_data=json_encode($user_data);
file_put_contents('path_to/user_data/thisuser',$json_data);

Then when it's time to display the page

<?php
$user_data=array('title'=>'My page rocks!','content'=>'lorems ipso diddy doo dah');

$file_data=file_get_contents('path_to/user_data/thisuser');
if(!$user_data){$no_data=true;}//file not found
$data_array=json_decode($user_data,true);
if(!is_array($data_array))
  { $no_data=true; }//maybe the json could not be parsed
else
  { $user_data=array_merge($user_data,$data_array); }
?>
<html>
<head>
<title>This is my page</title>
</head>
<body>
<h1 class="editable"><?php echo $user_data['title']?></h1>

And so on. The defaults array holds the standard content for editable sections, which are printed if the user has not supplied any. If they have, it's loaded, and then merged with the default array. The data loaded from a file will overwrite the default array's info, if available, in array_merge part.

Alex JL
A: 

Ok, I finally worked it out. If anyone is looking to find any include, include_once, require, require_once in a .php file then you can use the following regular expression with a php function like preg_match_all.

'/(?:include|include_once|require|require_once)\s*(?:[a-z]|"|\(|\)|\'|_|\.|\s|\/)*(?=(?:[^\<\?])*\?>)/i';

This looks for any includes etc within tags. Referencing this back to my original example. My code looks like this:

$html = new simple_html_dom();
$html->load_file("index.php");

$findinclude = '/(?:include|include_once|require|require_once)\s*(?:[a-z]|"|\(|\)|\'|_|\.|\s|\/)*(?=(?:[^\<\?])*\?>)/i';

if (preg_match_all($findinclude, $html,$includes)):

    // shift the array to the left
    $incfiles = $includes[0];
    $i = 0;

    // then loop through the includes array and print our filename
    foreach ($incfiles as $inc) {
       print basename(preg_replace('/[^a-zA-Z0-9\s\.\_\/]/', '', $inc)."\n");
    }
endif;

Job done! I can now work through this to edit each file as required.

Scott
+1  A: 

Use a php framework!! Please!!

AntonioCS
I do not find this very helpful. Can you suggest a framework? One that takes care of this problem? Does using a framework somehow make all my problems go away? Does it help me to learn how to write code? Please elaborate.
Scott
Yes, actually using a framework does make a lot of your problems go away. The idea is that the other people working on it have already come to find a system that is secure and works well, and encapsulated large amounts of functionality into the framework for you to use. It does help to learn to write code, actually, to study how frameworks do things (if it's a good one!) Check out Kohana for instance.
Alex JL
+1  A: 

Based on the regex you provided, I've optimized it a bit and fixed some crucial bugs:

~<[?].*?(?:include|require(?:_once)?)\s*?(?:[(]?['"])(.+?)(?:['"][)]?)\s*?;.*?(?:[?]>)?~is

And in preg_match_all():

preg_match_all('~<[?].*?(?:include|require(?:_once)?)\s*?(?:[(]?[\'"])(.+?)(?:[\'"][)]?)\s*?;.*?(?:[?]>)?~is', $html, $includes);

It should match filenames with numbers, digits, dashes, underscores, slashes, spaces, dots and so on.

Also, the filename is stored in reference #1 and the ending PHP tag is optional.

It's worth mentioning that the token_get_all() function is much more reliable than regular expressions.

Alix Axel
Wow. Thank you Alix. I will give that a try :)
Scott