The process is quite complex and isn't exactly pretty. You need to look at the Title
class found in includes/Title.php
. You should start with the newFromText
method, but the bulk of the logic is in the secureAndSplit
method.
Note that (as ever with MediaWiki) the code is not decoupled in the slightest. If you want to replicate it, you'll need to extract the logic rather than simply re-using the class.
The logic looks something like this:
- Decode character references (e.g. é)
- Convert spaces to underscores
- Check whether the title is a reference to a namespace or interwiki
- Remove hash fragments (e.g.
Apple#Name
- Remove forbidden characters
- Forbid subdirectory links (e.g.
../directory/page
)
- Forbid triple tilde sequences (
~~~
) (for some reason)
- Limit the size to 255 bytes
- Capitalise the first letter
Furthermore, I believe I'm right in saying that quotation marks don't need to be encoded by the original user -- browsers can handle them transparently.
I hope that helps!