views:

411

answers:

4

I'm writing a program, foo, in C++. It's typically invoked on the command line like this:

foo *.txt

My main() receives the arguments in the normal way. On many systems, argv[1] is literally *.txt, and I have to call system routines to do the wildcard expansion. On Unix systems, however, the shell expands the wildcard before invoking my program, and all of the matching filenames will be in argv.

Suppose I wanted to add a switch to foo that causes it to recurse into subdirectories.

foo -a *.txt

would process all text files in the current directory and all of its subdirectories.

I don't see how this is done, since, by the time my program gets a chance to see the -a, then shell has already done the expansion and the user's *.txt input is lost. Yet there are common Unix programs that work this way. How do they do it?

In Unix land, how can I control the wildcard expansion?

(Recursing through subdirectories is just one example. Ideally, I'm trying to understand the general solution to controlling the wildcard expansion.)

+5  A: 

You program has no influence over the shells command line expansion. Which program will be called is determined after all the expansion is done, so it's already too late to change anything about the expansion programmatically.

The user calling your program, on the other hand, has the possibility to create whatever command line he likes. Shells allow to easily prevent wildcard expansion, usually by putting the argument in single quotes:

program -a '*.txt'

If your program is called like that it will receive two parameters -a and *.txt.

On Unix, you should just leave it to the user to manually prevent wildcard expansion if it is not desired.

sth
+1  A: 
foo -a '*.txt'

Part of the shell's job (on Unix) is to expand command line wildcard arguments. You prevent this with quotes.

Also, on Unix systems, the "find" command does what you want:

find . -name '*.txt'

will list all files recursively from the current directory down.

Thus, you could do

foo `find . -name '*.txt'`
xcramps
+2  A: 

As the other answers said, the shell does the wildcard expansion - and you stop it from doing so by enclosing arguments in quotes.

Note that options -R and -r are usually used to indicate recursive - see cp, ls, etc for examples.

Assuming you organize things appropriately so that wildcards are passed to your program as wildcards and you want to do recursion, then POSIX provides routines to help:

There is also ftw, which is very similar to nftw but it is marked 'obsolescent' so new code should not use it.


Adrian asked:

But I can say ls -R *.txt without single quotes and get a recursive listing. How does that work?

To adapt the question to a convenient location on my computer, let's review:

$ ls -F | grep '^m'
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte/
$ ls -R1 m*
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2

mte:
multithread.ec
multithread.ec.original
multithread2.ec
$

So, I have a sub-directory 'mte' that contains three files. And I have six files with names that start 'm'.

  • When I type 'ls -R1 m*', the shell notes the metacharacter '*' and uses its equivalent of glob() or wordexp() to expand that into the list of names:

    1. makefile
    2. mapmain.pl
    3. minimac.group
    4. minimac.passwd
    5. minimac_13.terminal
    6. mkmax.sql.bz2
    7. mte
  • Then the shell arranges to run '/bin/ls' with 9 arguments (program name, option -R1, plus 7 file names and terminating null pointer).

  • The ls command notes the options (recursive and single-column output), and gets to work.
    • The first 6 names (as it happens) are simple files, so there is nothing recursive to do.
    • The last name is a directory, so ls prints its name and its contents, invoking its equivalent of nftw() to do the job.
    • At this point, it is done.
  • This uncontrived example doesn't show what happens when there are multiple directories, and so the description above over-simplifies the processing.
  • Specifically, ls processes the non-directory names first, and then processes the directory names in alphabetic order (by default), and does a depth-first scan of each directory.
Jonathan Leffler
But I can say `ls -R *.txt` without single quotes and get a recursive listing. How does that work? (And, yeah, I know `-R` and `-r` are the usual choices for indicating a recursive descent. I purposely used a different letter to avoid getting answers that were too specific to that problem.)
Adrian McCarthy
No, you don't. Combining the `-R` option to `ls' and a wildcard does *not* give you a recursive list of all files matching that pattern.
Andrew Medico
@Andrew: yes, and no...You are right; if you have no directories names '*.txt', then you do not get a recursive listing. You are wrong in the event that there is a directory called, for example, xyz.txt. But mainly you are correct; '`ls -R *.txt`' does not restrict the listed files to those names ending '.txt'.
Jonathan Leffler
@AdrianMcCarthy - `ls -R *.txt` works because the `-R` options applies to the entire `ls` command. So the shell expands the wildcard and then `ls` recursively lists **all** the parameters. Try running `ls -R dir1.txt dir2.txt` (or whatever are the appropriate names) to see that.
R Samuel Klatchko
+1  A: 

I wanted to point out another way to turn off wildcard expansion. You can tell your shell to stop expanding wildcards with the the noglob option.

With bash use set -o noglob:

> touch a b c
> echo *
a b c
> set -o noglob
> echo *
*

And with csh, use set noglob:

> echo *
a b c
> set noglob
> echo *
*
R Samuel Klatchko