tags:

views:

116

answers:

3

A bit of background: When I save a web page from e.g. IE8 as "webpage, complete", the images and such that the page contains are placed in a subfolder with the postfix "_files". This convention allows Windows to synchronize the .htm file and the accompanying folder.

Now, in order to keep the synchronization intact, when I rename the HTML file from my Python script I want the "_files" folder to be renamed also. Is there an easy way to do this, or will I need to
- rename the .htm file
- rename the _files folder
- parse the .htm file and replace all references to the old _files folder name with the new name?

A: 

If you rename the folder, I'm not sure how you can get around parsing the .htm file and replacing instances of _files with the new suffix. Perhaps you can use a folder alias (shortcut?) but then that's not a very clean solution.

Alex Reynolds
+1  A: 

There is just one easy way: Have IE save the file again under the new name. But if you want to do it later, you must parse the HTML. In this case, BeautifulSoup is your friend.

Aaron Digulla
A: 

you can use simple string replace on your HTML file without parsing it, it can of course be troublesome if the text being replaced is mentioned in the HTML itself..

os.rename("test.html", "test2.html")
os.rename("test_files", "test2_files")

with open("test2.html", "r") as f:
     s = f.read().replace("test_files", "test2_files")

with open("test2.html", "w") as f:
     f.write(s)
Ivan