Find all the files on the live server that have not been touched in 48 hours and make them candidates for deletion (luckily there is a live server)
By "touched" I'm assuming you'll stat the file to see if it's been accessed by any part of the system. I'd go a month and a half on this rather than 48 hours. In older PHP code bases you'll often find there's a bunch of code lying around that gets called via a local cron job once a week or once a month, or a third party is calling it remotely as a pseudo-service on a regular basis. By waiting 6 weeks be more likely to catch any and all files that are being called.
Implement a template system (Smarty) and try to find duplicate code in the templates.
Why? Serious question, is there a reason to implement a template system? (non-PHP savvy designers, developers who get you into trouble by including too much logic in the Views, or you're the one creating templates, and you know you work much faster in smarty than in PHP). If not, avoid it and just use PHP.
Also, how realistic is it to implement a pure smarty template system? I'd give favorable odds that old PHP systems like this are going to have a ton of "business logic" mixed in with their views that can't be implemented in pure smarty, and if you allowed mixed PHP/Smarty your developers will use PHP everytime.
Alot of the methods have copied and pasted code ... I don't know how much I want to mess with it.
I don't know of any code analysis tools that will do this out of the box, but it sould be possible to whip something up with the tokenizer functions.
What You Should Really Do
I don't want to dissuade you or demoralize you, but why do you want to cleanup this code? Right now it's doing what's is supposed to do. Stupidly, but it's doing it. Every re-factoring project is going to put current, undocumented, possibly business critical functionality at risk and at the end of that work you have an application that's doing the exact same thing. It's 70k lines of what sounds like shoddy code that only you care about fixing, no mater what other people are telling you their priorities are. If their priority was clean code, their code would already be clean. One person can't change a culture. Unless there's a straight forward business case for that code to be cleaned (open sourcing the project as a business strategy?), that legacy code isn't going anywhere.
Here's a different set of priorties to consider with legacy PHP applications
Is there a singleton database object or pair of objects that allows developers to easily setup seperate connections for read (slave) and write (master). Lot of legacy PHP applications will instantiate multiple connections to the same database in a single page call, which is a performance nightmare.
Is there a straight forward way for developers to avoid SQL injection? Give this to them for new code (parameterized SQL), and consider fixing legacy SQL to use this new method, but also consider security steps you can take on the network level.
Get a test framework of some kind wrapped around all the legacy code and treat it as a black-box. Use those tests to create a centralized API developers can use in place of the myriad function calls and copy/paste code they've been using.
Develop a centralized system for configuration values, most legacy PHP code is some awful combination of defines and class constants, which means any config changes mean a code push, which means potential DOOM.
Develop a lint that's hooked into the source control system to enforce code sanity for all new code, not just for style, but to make sure that business logic stays out of the view, that the SQL is being contructed in a safe way, that those old copy/paste libraries aren't being used, etc.
Develop a sane, trackable build and/or push system and stop people from hackin on code live in production