views:

780

answers:

7

I'm writing grepath utility that finds executables in %PATH% that match a pattern. I need to define whether given filename in the path is executable (emphasis is on command line scripts).

Based on "Tell if a file is executable" I've got:

import os
from pywintypes import error
from win32api   import FindExecutable, GetLongPathName

def is_executable_win(path):
    try:
        _, executable = FindExecutable(path)
        ext = lambda p: os.path.splitext(p)[1].lower()
        if (ext(path) == ext(executable) # reject *.cmd~, *.bat~ cases
            and samefile(GetLongPathName(executable), path)):
            return True
        # path is a document with assoc. check whether it has extension
        # from %PATHEXT% 
        pathexts = os.environ.get('PATHEXT', '').split(os.pathsep)
        return any(ext(path) == e.lower() for e in pathexts)
    except error:
        return None # not an exe or a document with assoc.

Where samefile is:

try: samefile = os.path.samefile
except AttributeError:    
    def samefile(path1, path2):
        rp = lambda p: os.path.realpath(os.path.normcase(p))
        return rp(path1) == rp(path2)

How is_executable_win could be improved in the given context? What functions from Win32 API could help?

P.S.

  • time performance doesn't matter
  • subst drives and UNC, unicode paths are not under consideration
  • C++ answer is OK if it uses functions available on Windows XP

Examples

  • notepad.exe is executable (as a rule)
  • which.py is executable if it is associated with some executable (e.g., python.exe) and .PY is in %PATHEXT% i.e., 'C:\> which' could start:

    some\path\python.exe another\path\in\PATH\which.py
    
  • somefile.doc most probably is not executable (when it is associated with Word for example)

  • another_file.txt is not executable (as a rule)
  • ack.pl is executable if it is associated with some executable (most probably perl.exe) and .PL is in %PATHEXT% (i.e. I can run ack without specifing extension if it is in the path)

What is "executable" in this question

def is_executable_win_destructive(path):
    #NOTE: it assumes `path` <-> `barename` for the sake of example
    barename = os.path.splitext(os.path.basename(path))[0]
    p = Popen(barename, stdout=PIPE, stderr=PIPE, shell=True)
    stdout, stderr = p.communicate()
    return p.poll() != 1 or stdout != '' or stderr != error_message(barename)

Where error_message() depends on language. English version is:

def error_message(barename):
    return "'%(barename)s' is not recognized as an internal" \
           " or external\r\ncommand, operable program or batch file.\r\n" \
           %  dict(barename=barename)

If is_executable_win_destructive() returns when it defines whether the path points to an executable for the purpose of this question.

Example:

>>> path = r"c:\docs\somefile.doc"
>>> barename = "somefile"

After that it executes %COMSPEC% (cmd.exe by default):

c:\cwd> cmd.exe /c somefile

If output looks like this:

'somefile' is not recognized as an internal or external
command, operable program or batch file.

Then the path is not an executable else it is (lets assume there is one-to-one correspondence between path and barename for the sake of example).

Another example:

>>> path = r'c:\bin\grepath.py'
>>> barename = 'grepath'

If .PY in %PATHEXT% and c:\bin is in %PATH% then:

c:\docs> grepath
Usage:
  grepath.py [options] PATTERN
  grepath.py [options] -e PATTERN

grepath.py: error: incorrect number of arguments

The above output is not equal to error_message(barename) therefore 'c:\bin\grepath.py' is an "executable".

So the question is how to find out whether the path will produce the error without actually running it? What Win32 API function and what conditions used to trigger the 'is not recognized as an internal..' error?

+2  A: 

A windows PE always starts with the characters "MZ". This includes however also any kind of DLLs which are not necessarily executables.
To check for this however you'll have to open the file and read the header so that's probably not what you're looking for.

shoosh
Technically a DOS .exe file starts with MZ. A PE file contains a dos executable (usually this prog prints "This program requires Windows.") followed by the PE header which start with PE.
jmucchiello
I've added examples of what I consider an executable. I'm *not* interested only in `exe`-files.
J.F. Sebastian
+3  A: 

shoosh beat me to it :)

If I remember correctly, you should try to read the first 2 characters in the file. If you get back "MZ", you have an exe.


hnd = open(file,"rb")
if hnd.read(2) == "MZ":
  print "exe"
Geo
I've added examples of what I consider an executable. I'm *not* interested only in `exe`-files.
J.F. Sebastian
A: 

Parse the PE format.

http://code.google.com/p/pefile/

This is probably the best solution you will get other than using python to actually try to run the program.

Edit: I see you also want files that have associations. This will require mucking in the registry which I don't have the information for.

Edit2: I also see that you differentiate between .doc and .py. This is a rather arbitrary differentiation which must be specified with manual rules, because to windows, they are both file extensions that a program reads.

Unknown
Consider: what is the difference between .txt and .bat? Why is there %PATHEXT% environment variable?
J.F. Sebastian
A: 

> Parse the PE format.

No, totally useless. Just use buit-in Win32 apis
(1 api call)

Excellent. And what is the call?
J.F. Sebastian
A: 

Your question can't be answered. Windows can't tell the difference between a file which is associated with a scripting language vs. some other arbitrary program. As Windows is concerned, a .PY file is simply a document which is opened by python.exe.

Jesse Weigert
Read 'What is "executable"' section of my question.
J.F. Sebastian
+2  A: 

I think, that this should be sufficient:

  1. check file extension in PATHEXT - whether file is directly executable
  2. using cmd.exe command "assoc .ext" you can see whether file is associated with some executable (some executable will be launched when you launch this file). You can parse capture output of assoc without arguments and collect all extensions that are associated and check tested file extension.
  3. other file extensions will trigger error "command is not recognized ..." therefore you can assume that such files are NOT executable.

I don't really understand how you can tell the difference between somefile.py and somefile.txt because association can be really the same. You can configure system to run .txt files the same way as .py files.

Jiri
Or you can go through the registry to find all filetypes that have an association. But the point is the same, anything with a association can be started through the shell directly. So it's either pretty much all files, or you make a list by hand.
THC4k
J.F. Sebastian
Point 3. from your answer is implemented in the question, see `is_executable_win_destructive()`)
J.F. Sebastian
Jiri
@Jiri: Point 2. might return non-executable in the sense of point 3. e.g., I've seen how one of WinAPI functions (don't remember which one) returns `a.bat~` as executable (point 2), but it is not executable in the sense (pp. 1 or 3). Obviously sets 1 and 3 are not the same that returns us to square one: how to implement p 3. in non-desctructive terms (to simplify a bit).
J.F. Sebastian
+1  A: 

Here's the grepath.py that I've linked in my question:

#!/usr/bin/env python
"""Find executables in %PATH% that match PATTERN.

"""
#XXX: remove --use-pathext option

import fnmatch, itertools, os, re, sys, warnings
from optparse import OptionParser
from stat import S_IMODE, S_ISREG, ST_MODE
from subprocess import PIPE, Popen


def warn_import(*args):
    """pass '-Wd' option to python interpreter to see these warnings."""
    warnings.warn("%r" % (args,), ImportWarning, stacklevel=2)


class samefile_win:
    """
http://timgolden.me.uk/python/win32_how_do_i/see_if_two_files_are_the_same_file.html
"""
    @staticmethod
    def get_read_handle (filename):
        return win32file.CreateFile (
            filename,
            win32file.GENERIC_READ,
            win32file.FILE_SHARE_READ,
            None,
            win32file.OPEN_EXISTING,
            0,
            None
            )

    @staticmethod
    def get_unique_id (hFile):
        (attributes,
         created_at, accessed_at, written_at,
         volume,
         file_hi, file_lo,
         n_links,
         index_hi, index_lo
         ) = win32file.GetFileInformationByHandle (hFile)
        return volume, index_hi, index_lo

    @staticmethod
    def samefile_win(filename1, filename2):
        """Whether filename1 and filename2 represent the same file.

It works for subst, ntfs hardlinks, junction points.
It works unreliably for network drives.

Based on GetFileInformationByHandle() Win32 API call.
http://timgolden.me.uk/python/win32_how_do_i/see_if_two_files_are_the_same_file.html
"""
        if samefile_generic(filename1, filename2): return True
        try:
            hFile1 = samefile_win.get_read_handle (filename1)
            hFile2 = samefile_win.get_read_handle (filename2)
            are_equal = (samefile_win.get_unique_id (hFile1)
                         == samefile_win.get_unique_id (hFile2))
            hFile2.Close ()
            hFile1.Close ()
            return are_equal
        except win32file.error:
            return None


def canonical_path(path):
    """NOTE: it might return wrong path for paths with symbolic links."""
    return os.path.realpath(os.path.normcase(path))


def samefile_generic(path1, path2):
    return canonical_path(path1) == canonical_path(path2)


class is_executable_destructive:
    @staticmethod
    def error_message(barename):
        r"""
"'%(barename)s' is not recognized as an internal or external\r\n
command, operable program or batch file.\r\n"

in Russian:
"""
        return '"%(barename)s" \xad\xa5 \xef\xa2\xab\xef\xa5\xe2\xe1\xef \xa2\xad\xe3\xe2\xe0\xa5\xad\xad\xa5\xa9 \xa8\xab\xa8 \xa2\xad\xa5\xe8\xad\xa5\xa9\r\n\xaa\xae\xac\xa0\xad\xa4\xae\xa9, \xa8\xe1\xaf\xae\xab\xad\xef\xa5\xac\xae\xa9 \xaf\xe0\xae\xa3\xe0\xa0\xac\xac\xae\xa9 \xa8\xab\xa8 \xaf\xa0\xaa\xa5\xe2\xad\xeb\xac \xe4\xa0\xa9\xab\xae\xac.\r\n' % dict(barename=barename)

    @staticmethod
    def is_executable_win_destructive(path):
        # assume path <-> barename that is false in general
        barename = os.path.splitext(os.path.basename(path))[0]
        p = Popen(barename, stdout=PIPE, stderr=PIPE, shell=True)
        stdout, stderr = p.communicate()
        return p.poll() != 1 or stdout != '' or stderr != error_message(barename)


def is_executable_win(path):
    """Based on:
http://timgolden.me.uk/python/win32_how_do_i/tell-if-a-file-is-executable.html

Known bugs: treat some "*~" files as executable, e.g. some "*.bat~" files
"""
    try:
        _, executable = FindExecutable(path)
        return bool(samefile(GetLongPathName(executable), path))
    except error:
        return None # not an exe or a document with assoc.


def is_executable_posix(path):
    """Whether the file is executable.

Based on which.py from stdlib
"""
    #XXX it ignores effective uid, guid?
    try: st = os.stat(path)
    except os.error:
        return None

    isregfile = S_ISREG(st[ST_MODE])
    isexemode = (S_IMODE(st[ST_MODE]) & 0111)
    return bool(isregfile and isexemode)

try:
    #XXX replace with ctypes?
    from win32api import FindExecutable, GetLongPathName, error
    is_executable = is_executable_win
except ImportError, e:
    warn_import("is_executable: fall back on posix variant", e)
    is_executable = is_executable_posix

try: samefile = os.path.samefile
except AttributeError, e:
    warn_import("samefile: fallback to samefile_win", e)
    try:
        import win32file
        samefile = samefile_win.samefile_win
    except ImportError, e:
        warn_import("samefile: fallback to generic", e)
        samefile = samefile_generic

def main():
    parser = OptionParser(usage="""
%prog [options] PATTERN
%prog [options] -e PATTERN""", description=__doc__)
    opt = parser.add_option
    opt("-e", "--regex", metavar="PATTERN",
        help="use PATTERN as a regular expression")
    opt("--ignore-case", action="store_true", default=True,
        help="""[default] ignore case when --regex is present; for \
non-regex PATTERN both FILENAME and PATTERN are first \
case-normalized if the operating system requires it otherwise \
unchanged.""")
    opt("--no-ignore-case", dest="ignore_case", action="store_false")
    opt("--use-pathext", action="store_true", default=True,
        help="[default] whether to use %PATHEXT% environment variable")
    opt("--no-use-pathext", dest="use_pathext", action="store_false")
    opt("--show-non-executable", action="store_true", default=False,
        help="show non executable files")

    (options, args) = parser.parse_args()

    if len(args) != 1 and not options.regex:
       parser.error("incorrect number of arguments")
    if not options.regex:
       pattern = args[0]
    del args

    if options.regex:
       filepred = re.compile(options.regex, options.ignore_case and re.I).search
    else:
       fnmatch_ = fnmatch.fnmatch if options.ignore_case else fnmatch.fnmatchcase
       for file_pattern_symbol in "*?":
           if file_pattern_symbol in pattern:
               break
       else: # match in any place if no explicit file pattern symbols supplied
           pattern = "*" + pattern + "*"
       filepred = lambda fn: fnmatch_(fn, pattern)

    if not options.regex and options.ignore_case:
       filter_files = lambda files: fnmatch.filter(files, pattern)
    else:
       filter_files = lambda files: itertools.ifilter(filepred, files)

    if options.use_pathext:
       pathexts = frozenset(map(str.upper,
            os.environ.get('PATHEXT', '').split(os.pathsep)))

    seen = set()
    for dirpath in os.environ.get('PATH', '').split(os.pathsep):
        if os.path.isdir(dirpath): # assume no expansion needed
           # visit "each" directory only once
           # it is unaware of subst drives, junction points, symlinks, etc
           rp = canonical_path(dirpath)
           if rp in seen: continue
           seen.add(rp); del rp

           for filename in filter_files(os.listdir(dirpath)):
               path = os.path.join(dirpath, filename)
               isexe = is_executable(path)

               if isexe == False and is_executable == is_executable_win:
                  # path is a document with associated program
                  # check whether it is a script (.pl, .rb, .py, etc)
                  if not isexe and options.use_pathext:
                     ext = os.path.splitext(path)[1]
                     isexe = ext.upper() in pathexts

               if isexe:
                  print path
               elif options.show_non_executable:
                  print "non-executable:", path


if __name__=="__main__":
   main()
J.F. Sebastian