tags:

views:

135

answers:

5

What is the best way to transform large bunches of very similar web pages into a newer css-based layout programatically?

I am changing all the contents of an old website into a new css-based layout. Many of the pages are very similar, and I want to be able to automate the process.

What I am currently thinking of doing is to read the pages in using HtmlAgilityPack, and make a method for each group of similar pages that will create the output text.

What do you think is the best way to do this? The pages mostly differ by things like which .jpg file is used for the image, or how many groups of heading-image-text there are on that particular page

EDIT: I cannot use any other file type than .html, as that is all I am authorized to do. Any suggestions?

EDIT2: Ideally, I would also be able to make this be generic enough that I could use it for many different groups of html files by just switching around a few moving parts.

SAMPLE OF TYPICAL PAGE

The above link is a sample of what I am dealing with. The parts that would differ between pages would be:

  • the meta description tag
  • various headers, especailly the main header
  • almost every image on the page will be new
  • the text for each video will be unique, but they will be grouped together in similar chunks
  • the video files, and video sizes will be unique

Everything else is the same, and the format of the pages is also the same.

EDIT3: When in doubt another thing that might be helpful is to write some code that will write the pages for me. I just need to cut out the parts of the originals that are variable, and put them into a data file that gets read and used to write the new versions.

A: 

While this might sound a bit glib, the best real option I could offer would be Rent-A-Coder

Adam Robinson
Good idea, assuming the information isn't proprietary it may be quicker to pay someone $50 then spending time doing it your self.
Jared
Sure, if you'd like to end up with a large group of similar crappy _css_ pages.
Ben S
Yes, clearly offering up the possibility of hiring someone to do work that's generally better suited to human- versus machine-processing is worthy of disdain :roll:
Adam Robinson
-1. I don't see how this comment is at all helpful.
Banang
It's "helpful" because it offers a legitimate option. People quite often spend more time trying to come up with a generalized, comprehensive solution for performing a one-time translation of data from format A to format B when it would take less time (and, consequently, less money in most cases) simply to perform the conversion by hand either yourself or by hiring someone to do it. This seems like a simple job, something well-suited to a service like RAC.
Adam Robinson
A: 

Depends on the page, you could write scripts in Perl or any other scripting language your comfortable with to do as much as possible and have them note anything they couldn't fix or didn't understand.

Jared
+1  A: 

It depends on how similar "very similar" actually is. If you mean that they effectively use a number of templates, then I would probably build new templates for the new design using Template-Toolkit and suck out the data using Template::Extract. Possibly storing the data in a local database to make it easier to rebuild the pages in future.

David Dorward
http://www.shaolin.org/video-clips-3/sabah2007/sabah01.htmlis an example. do you think your ideas would work for this?
Alex Baranosky
+1  A: 

I think it depends on how many pages there are, if there are not too many, you could create a template and use a wysiwyg editor to copy and paste the content.

However if you need to do it programaticaly I would suggest parsing the html to extract the content. Or cleaning it up, If you have access to it you can use Expression Web, which I used for a similar task, you can clean the html and only leave the header tags, paragraph etc, then you can apply css to it to format it in the design you wish.

However it might take longer to write code to do it than do it manualy. Sometimes nothing is faster than by hand.

Good luck

Sebastian Bender
There are a LOT of pages. Just ONE example I am doing has 38 pages. But that's only one. There are probably 20-30 of those.
Alex Baranosky
A: 

When faced with old, often generated code like this, I tend to lean towards the search and replace in my text editor.

Sounds awful, doesn't it?

Seriously though, if you get a powerful editor that supports searching multiple files and/or regular expressions, that can remove the bulk of the nasty code. It's not a perfect science to say the least, and some manual manipulation may be necessary to get it into a "useful" form, but it takes away the bulk of the cleanup work.

Jacob Hume