The goal is to read through html files and change all instances of MyWord to Myword; except, must NOT change the word if it is found inside or as part of a path, file name or script:
href="..."
src="..."
url(...)
class="..."
id="..."
script inline or linked (file name) --> <script ...></script>
styles inline or linked (file name) --> <link ...> <style></style>
Now the question of all questions: how do you determine if the instance of the word is in a position where it's ok to change it? (or, how do you determine if the word is inside of one of the above listed locations and should not be changed?)
Here is my code, it can be changed to read line by line, etc. but I just can not think of how to define and enforce a rule to match above...
Here it is:
#!/usr/bin/python
import os
import time
from stat import *
def fileExtension(s):
i = s.rfind('.')
if i == -1:
return ''
tmp = '|' + s[i+1:] + '|'
return tmp
def changeFiles():
# get all files in current directory with desired extension
files = [f for f in os.listdir('.') if extStr.find(fileExtension(f)) != -1]
for f in files:
if os.path.isdir(f):
continue
st = os.stat(f)
atime = st[ST_ATIME] # org access time
mtime = st[ST_MTIME] # org modification time
fw = open(f, 'r+')
tmp = fw.read().replace(oldStr, newStr)
fw.seek(0)
fw.write(tmp)
fw.close()
# put file timestamp back to org timestamp
os.utime(f,(atime,mtime))
# if we want to check subdirectories
if checkSubDirs :
dirs = [d for d in os.listdir('.') if os.path.isdir(d)]
for d in dirs :
os.chdir(d)
changeFiles()
os.chdir('..')
# ==============================================================================
# ==================================== MAIN ====================================
oldStr = 'MyWord'
newStr = 'Myword'
extStr = '|html|htm|'
checkSubDirs = True
changeFiles()
Anybody know how? Have any suggestions? ANY help is appreciated, beating my brain for 2 days now and just can not think of anything.