views:

55

answers:

1

I'm searching for a good tutorial on how to build a wiki with PHP & MySQL. I need something that will show me how to structure the mysql database and which algorithms to implement for the text diff etc. The only one that I've found so far is this: http://www.ibm.com/developerworks/opensource/tutorials/os-php-wiki1/ But it is based on cakePHP and I'll have to spend a month learning cakePHP to understand what they did.

Thank you in advance for any hint!

+1  A: 

Assuming you're doing this for tutorial's sake (because there's already a lot of wonderful wiki software out there), let's see if I can't tackle a very general overview of the subject. Of course I don't have the time or space to code an entire wiki here in this answer, nor would you learn anything if I did. But I'll try to lay out the general block diagram and give an example of a database schema.

First off, consider the different pages your wiki will need. Not informational pages, but actually the different functional pages that you'll have.

  • One page for parsing requests (find the article they requested and send them the information)
  • One page for editing articles
  • One page for logging in

Depending on how simple or complicated you want to get, these are the only three pages you need. Your "home page" and any other important pages ("contact us", "about us", "how to use this wiki", etc.) can all be articles, so they'd be handled the same way you parse requests. After you have a functional wiki you may consider adding some of the following:

  • Revision tracking (log of changes to each article)
  • Article downloader (use PHP modules to output the article to a PDF, txt, or some other format and send it as a download using header())
  • Media viewer (as on wikipedia, when you click an image it takes you to a page with information about that image, who uploaded it, how big it is, etc.)
  • Anything else, be creative! =)

Since we don't have a functioning wiki yet, these would take extra time and effort to implement, so let's start with the basic three pages. Each one of these pages should have their own block diagram, or simple list of functions that they must perform (the exact implementation of which I'll leave up to you to figure out)

For the article parsing script:

  • First it has to find the article they're looking for. This could involve parsing their search string (converting "the_nile_river" to "The Nile River") before performing the search or it may require some comparing to find the most related post (seeing "The Nile" and redirecting to "The Nile River"). This part of the wiki you can improve upon infinitely, as no one has developed a "perfect" search algorithm yet.
  • If the article can not be found, then you need to have some error state. Either give a list of suggested articles, continue the search looking for their terms in the body of each article instead of the title, or just apologize and ask that they search again. A good wiki will always offer them the ability to create the article if it doesn't exist (a link to the article editing page)
  • If the article can be found, it needs to be able to translate the contents of the article to HTML. For extremely simple articles, this could just be a matter of using htmlentities() to convert things like & to &. As your articles get more complicated, though, you may want a way to display headers, links to other articles, etc. For this you would probably want to use special template parsing so as to give your users no direct control over the HTML. I've never personally coded one of these, but I can imagine it has a lot of preg_replace() statements.
  • Finally, you need to consider what header/sidebar/footer information to display. This information will probably be different if they're logged in, it may contain links to related articles, and it may have links to edit this article.

As for editing articles, this page is probably the easiest. I would do something as follows:

  • Check to see if they're logged in
  • If yes, give them a <textarea> which is pre-populated with the original article sans parsing
  • If no, apologize and tell them to log in. Provide a link to the login page

As for the login page, if you've ever coded something with multiple users then you'll understand how this should work.

  • Check if $_POST["username"] is set. If so, they've sent their login information. If not, send them the login form. (username, password, submit)
  • If it is set, hash the password (with a salt!), compare to the hash in the database, and if they match- start a session. If they don't match, apologize and send them the login form again.

As far as how the database should look, you have a one table that is nearly required (everyone would put this in a database and no one would trust it to flat files for security reasons).

users
-----
id (int)
username (string)
hashed_password (string)
extra info (email, website, last seen, preferences, etc.)

A table with information for any user logging in. As for the articles themselves, you can either choose to store them in the database or in files. MediaWiki stores everything in MySQL, however DokuWiki uses TXT files. It's partially a matter of preference, but there are some other things to consider.

  • MySQL rows have a set size. This size can be set to something incredibly large like 16777216 characters, but there is still a limit, meaning a maximum article size. TXT files can grow arbitrarily large (http://dev.mysql.com/doc/refman/4.1/en/storage-requirements.html)
  • Opening and reading files can go slower if there are many files on the system. A trick to prevent this slow-down that works on some systems and not others (depending on the file system in use) is to make multiple folders. For instance, every article that starts with "ab" ("Abdominals", "Absorbency", "Abel (Bible Character)", etc.) would be in one folder, and every article that starts with "ac" would be in another.
  • TXT files theoretically pose slightly more of a security risk since you have to authenticate with an SQL server. This security risk can be nullified by putting the TXT files outside of the webroot and setting the permissions correctly (700 or similar)
  • By keeping it in a database it's much easier to store meta-data about the article (right beside the article's content, have a separate column for "last edited", "edited by", "last searched", etc. This kind of data can be stored in a TXT file, but it's much harder since you'd need to consider a delimiter for meta-data and you'd have to edit it without harming the other contents of the file.

If you were to store articles in a database, I would recommend a set up similar to the following:

articles
--------
id (int)
title (tinyblob)
content (mediumblob)
meta info

As you add features, you may start wanting database tables for these features, as well. For instance, if you're going to have a media viewer page for every image (rather than just having the article display the image) then you'll need to consider the following:

  • Database with link to image and information about image
  • Method of referencing the image from within the template
  • Method of parsing that reference based on the database values

To get you thinking about some of the features you might include and how these would affect your database, here's what MediaWiki's database looks like:

http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png

steven_desu