views:

164

answers:

2

Hi, I have some xml-configuration files that we create in a Windows environment but is deployed on Linux. These configuration files reference each other with filepaths. We've had problems with case-sensitivity and trailing spaces before, and I'd like to write a script that checks for these problems. We have Cygwin if that helps.

Example:

Let's say I have a reference to the file foo/bar/baz.xml, I'd do this

<someTag fileref="foo/bar/baz.xml" />

Now if we by mistake do this:

<someTag fileref="fOo/baR/baz.Xml  " />

It will still work on Windows, but it will fail on Linux.

What I want to do is detect these cases where the file reference in these files don't match the real file with respect to case sensitivity.

A: 

it's hard to judge what exactly your problem is, but if you apply os.path.normcase along with str.stript before saving your file name, it should solve all your problems.

as I said in comment, it's not clear how are you ending up with such a mistake. However, it would be trivial to check for existing file, as long as you have some sensible convention (all file names are lower case, for example):

try:
    open(fname)
except IOError:
    open(fname.lower())
SilentGhost
I added an example to clarify.
Pär Bohrarper
The thing that in the end opens the files is closed source, and is not in my control. That's why I need to validate the files. And yes, the file refs are added by hand in a tool that I don't have control over.
Pär Bohrarper
The question is whether you have a convention for you file names. That is, how do you know that correct name of the file is `*.xml` and not `*.Xml`? You realise that you cannot just go checking each possible combination of cases to find out which one exists?
SilentGhost
We have conventions, but some referenced files are delivered by our customer. I suppose we could make them all lower case or something, but I'd rather not.
Pär Bohrarper
+3  A: 

os.listdir on a directory, in all case-preserving filesystems (including those on Windows), returns the actual case for the filenames in the directory you're listing.

So you need to do this check at each level of the path:

def onelevelok(parent, thislevel):
  for fn in os.listdir(parent):
    if fn.lower() == thislevel.lower():
      return fn == thislevel
  raise ValueError('No %r in dir %r!' % (
      thislevel, parent))

where I'm assuming that the complete absence of any case variation of a name is a different kind of error, and using an exception for that; and, for the whole path (assuming no drive letters or UNC that wouldn't translate to Windows anyway):

def allpathok(path):
  levels = os.path.split(path)
  if os.path.isabs(path):
    top = ['/']
  else:
    top = ['.']
  return all(onelevelok(p, t)
             for p, t in zip(top+levels, levels))

You may need to adapt this if , e.g., foo/bar is not to be taken to mean that foo is in the current directory, but somewhere else; or, of course, if UNC or drive letters are in fact needed (but as I mentioned translating them to Linux is not trivial anyway;-).

Implementation notes: I'm taking advantage of the fact that zip just drop "extra entries" beyond the length of the shortest of the sequences it's zipping; so I don't need to explicitly slice off the "leaf" (last entry) from levels in the first argument, zip does it for me. all will short circuit where it can, returning False as soon as it detects a false value, so it's just as good as an explicit loop but faster and more concise.

Alex Martelli
I suspected that I had to do something like this. I'll have to adapt it a bit since foo/bar/baz.xml is not relative to the current directory. It's relative to a (small) number of possible top level paths.
Pär Bohrarper