tags:

views:

247

answers:

4

I just inherited a 70k line PHP codebase that I now need to add enhancements onto. I've seen worse, at least this codebase uses an MVC architecture and is object oriented. However, there is no templating system and many classes are deprecated - only being called once. I think my method might be the following:

  1. Find all the files on the live server that have not been touched in 48 hours and make them candidates for deletion (luckily there is a live server).
  2. Implement a template system (Smarty) and try to find duplicate code in the templates.
  3. Alot of the methods have copied and pasted code ... I don't know how much I want to mess with it.

My questions are: Are there steps that I should take or you would take? What is your method for dealing with this? Are there tools to help find duplicate PHP code?

+3  A: 

I don't know of any specific tools, but I have worked on re-factoring some fairly large PHP projects.

I would recommend a templating system, either Smarty or a strict PHP system that is clearly explained to anybody working on the project.

Take discrete, manageable sections and re-factor on a regular basis (e.g., this week, I'm going to re-write this). Don't bite off more than you can chew and don't plan to do a full rewrite.

Also, I do regular code searches (I use Eclipse and search through the files in my project) on suspect functions and files. Some people are too scared to make big changes, but I would rather err on the bold side rather than accept messy and poorly organized code. Just be prepared to test, test, test!

jonstjohn
+5  A: 

Find all the files on the live server that have not been touched in 48 hours and make them candidates for deletion (luckily there is a live server)

By "touched" I'm assuming you'll stat the file to see if it's been accessed by any part of the system. I'd go a month and a half on this rather than 48 hours. In older PHP code bases you'll often find there's a bunch of code lying around that gets called via a local cron job once a week or once a month, or a third party is calling it remotely as a pseudo-service on a regular basis. By waiting 6 weeks be more likely to catch any and all files that are being called.

Implement a template system (Smarty) and try to find duplicate code in the templates.

Why? Serious question, is there a reason to implement a template system? (non-PHP savvy designers, developers who get you into trouble by including too much logic in the Views, or you're the one creating templates, and you know you work much faster in smarty than in PHP). If not, avoid it and just use PHP.

Also, how realistic is it to implement a pure smarty template system? I'd give favorable odds that old PHP systems like this are going to have a ton of "business logic" mixed in with their views that can't be implemented in pure smarty, and if you allowed mixed PHP/Smarty your developers will use PHP everytime.

Alot of the methods have copied and pasted code ... I don't know how much I want to mess with it.

I don't know of any code analysis tools that will do this out of the box, but it sould be possible to whip something up with the tokenizer functions.

What You Should Really Do

I don't want to dissuade you or demoralize you, but why do you want to cleanup this code? Right now it's doing what's is supposed to do. Stupidly, but it's doing it. Every re-factoring project is going to put current, undocumented, possibly business critical functionality at risk and at the end of that work you have an application that's doing the exact same thing. It's 70k lines of what sounds like shoddy code that only you care about fixing, no mater what other people are telling you their priorities are. If their priority was clean code, their code would already be clean. One person can't change a culture. Unless there's a straight forward business case for that code to be cleaned (open sourcing the project as a business strategy?), that legacy code isn't going anywhere.

Here's a different set of priorties to consider with legacy PHP applications

  1. Is there a singleton database object or pair of objects that allows developers to easily setup seperate connections for read (slave) and write (master). Lot of legacy PHP applications will instantiate multiple connections to the same database in a single page call, which is a performance nightmare.

  2. Is there a straight forward way for developers to avoid SQL injection? Give this to them for new code (parameterized SQL), and consider fixing legacy SQL to use this new method, but also consider security steps you can take on the network level.

  3. Get a test framework of some kind wrapped around all the legacy code and treat it as a black-box. Use those tests to create a centralized API developers can use in place of the myriad function calls and copy/paste code they've been using.

  4. Develop a centralized system for configuration values, most legacy PHP code is some awful combination of defines and class constants, which means any config changes mean a code push, which means potential DOOM.

  5. Develop a lint that's hooked into the source control system to enforce code sanity for all new code, not just for style, but to make sure that business logic stays out of the view, that the SQL is being contructed in a safe way, that those old copy/paste libraries aren't being used, etc.

  6. Develop a sane, trackable build and/or push system and stop people from hackin on code live in production

Alan Storm
I guess there is no 'best' answer (i.e. some magical program that profiles the app and allows me to work with it to actively modify the codebase intelligently).
Adam Nelson
+1  A: 

You need to identify a solid reason for refactoring. Removing duplicate code is not really a very good one; it needs to be coupled with a real desired improvement, such as reducing memory footprint (useful if the webservers are struggling).

Once you have that in mind, now you can start refactoring. And make sure you have a version-control repository, too. Just don't check in broken code.

Don't be too hasty about single-use classes. A lot of small PHP frameworks work like that. Often they could be abstracted better, though. Also, A lot of PHP code also doesn't understand data layer abstraction with the result that there is SQL code littered through the business logic or even the display code. This problem is often coupled with no custom database handler, which is a problem if you suddenly have to teach it about replication, or caching. This is the same abstraction problem from the other direction.

One very practical step: once you start abstracting repeated code away, you'll find reasons to have multiple files open. If you're using a shell and a Unix editor, then screen will help you immensely.

staticsan
I agree with your comments. What I kind of didn't put in the question was that I'm also laying down the groundwork for more development and I want to have the codebase as lean as possible before new people read it - so they're not burdened digesting garbage code.
Adam Nelson
That's a noble goal, but will probably take you a long, long time. I would start with the code where future development is likely to happen. And you may find assistance with the refactoring in your developers, too. :-)
staticsan
A: 

I am looking for the same....a toolset to clean up old php code with features such as:

  • filter all sql queries and paste them into a new file as functions
  • find unused methods and files on a live system
  • find unused tables and columns on the database

  • general tools to "mechanically" deal with old code, (e.g. externalize HTML and MySQL)

toby