ansaurus

Question

Best practice for a recursive console tool in Python

Answer 1

+2 A:

In my experience, the best starting point is to build a tool that follows basic Unix principles -- namely, to read from standard input and write to standard output. This allows people to use your tool in a flexible way:

flipcase input.txt > output.txt
othercommand | flipcase > output.txt
flipcase | othercommand > ouput.txt
flipcase input1.txt  input2.txt > output.txt

The next feature might be in-place editing:

# Modify input files directly.
flipcase -i input.txt

# Create backup copies before modifying originals.
flipcase -i --backup-suffix '_BAK' input.txt
flipcase -i --backup-prefix 'BAK_' input.txt

# Regex for power users.
flipcase -i --backup-regex 's/foo/bar/' input.txt

In verbose mode, the tool should not write to standard output, because that would conflict with the core principles above. It should write to standard error or a user-defined log file.

flipcase -v         input.txt > output.txt
flipcase -v log.txt input.txt > output.txt

After that, you add recursive behavior. The direction is less clear-cut here, but I'll toss out a few ideas. In the typical recursive case, the program's arguments are probably directories, and the user would need to supply additional options to define various types of filtering behavior (that is, which types of files to process).

flipcase -r -i --backup-suffix '_BAK' --filter-glob '*.txt' dir1 dir2
flipcase -r -i --backup-suffix '_BAK' --filter-glob '*.txt' --filter-glob 'log*.dat' dir
flipcase -r -i --backup-suffix '_BAK' --filter-regex 'log\w+\.(txt|log)$' dir1 dir2

# Don't do in-place editing. Instead create new files within the structure.
flipcase -r --newname-suffix '_NEW'              --filter-glob '*.txt' dir1 dir2
flipcase -r --newname-regex 's/\.txt$/_new.txt/' --filter-glob '*.txt' dir1 dir2

# Create the backups or the new files in a parallel directory
# structure rather than within the original structure.
flipcase -r -i --backup-tree 'backup_dir'   --filter-glob '*.txt' dir1 dir2
flipcase -r -i --new-tree    'newfiles_dir' --filter-glob '*.txt' dir1 dir2

FM 2010-09-05 18:37:46

Thanks for that comprehensive input!

mar10 2010-09-06 21:20:39

Are the options names you are using 'common', i.e. are there well known tools that use them?

mar10 2010-09-06 21:27:16

@mar10 Only in some cases. The `-v` and `-r` options are commonly used for verbose and recursive. The `-i` option reflects my Perl background, where it is used for in-place file editing (Perl probably inherited the convention from `sed`). The longer options that I proposed are just rough ideas. You might want to look at the other recursive Unix tools for ideas regarding option naming: `find`, `rsync`, and perhaps other.

FM 2010-09-06 23:10:57

Answer 2

+1 A:

What is the best practice (interface and implementation) for a command line tool that processes selected files in a directory tree?

I don't think there's a single standard or "best practice" when it comes to the implementation of a command line tool. Although, you'll gain lots of insights by looking at and experimenting with well built tools like the GNU coreutils for example.

Also, I think you're looking for something like this as well: http://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html

Reading and experimenting about the Unix way of doing this actually addresses many of your concerns regarding design decisions.

One problem that I see with this example is, that the *.txt argument is sometimes expanded by the shell (Unix and Vista), so I can't apply this pattern when walking sub directories.

In Unix, the * is automatically expanded. I'm not sure about Windows but if I'm not mistaken, * is not expanded so you can simply use glob.glob(sys.argv[1]). A workaround for Unix would be to escape the wildcard but there must be a better way.

Coding District 2010-09-05 19:00:42

Thanks for the pointer, GNU is a good reference. (Btw. Vista seems to expand *, but older version of Windows do not, as far as I know)

mar10 2010-09-06 21:37:57

Answer 3

A:

Recursive processing is usually done using os.path.walk, but you can create your own version to use Python generators which is much more command line friendly: piping will get the output as it's processed. Here is a tested and documented proof of concept.

With Python 3, you don't have to do it, as it provides os.walk that create a generator.

Then after, follow FM advices to create the CLI interface using optparse.

e-satis 2010-09-05 19:01:05

Answer 4

+1 A:

To address the globbing part of your question, the odd man out in your list is really supporting Windows. The UNIX way, and also a good way, to do it is to let the shell handle the globbing. You just get a list of files. I know no UNIX tool what does its own globbing (in basic cases like this). I'd suggest you don't do it yourself either, but rely on the shell.

On Windows, you could refer people to using a shell with Cygwin, or something like that. Of course, Windows users usually eschew the command line, so if you build a GUI they'll be happy too.

That doesn't cover your -r switch. But it gets difficult there. Do you want to provide to users the ability to specify "all files in subdirectories that have the extension .txt"? Note that modern shells like ZSH can do globs that recurse into directories, like:

rm **/*.tmp

and, as you say, you can always use find instead. So a recommendation here really needs to factor in the specifics of your tool. rsync benefits from implementing its own -r switch, but an hypothetical flipcase probably wouldn't.

loevborg 2010-09-05 19:15:18

I guess requiring cygwin is too much for most Windows users. I like the 'rm * * / *.tmp' syntax. But this seems hard to implement due to the shell globbing (given that I do not want to rely on a specific shell like ZSH)

mar10 2010-09-06 21:34:03

ansaurus

tags:

views:

answers:

Best practice for a recursive console tool in Python

related questions