tags:

views:

103

answers:

2

I'd like to write down a BNF-like formal grammar for describing the command line usage of some GNU/Linux tools. For example I can describe the usage of the cat command as:

(cat-command) : 'cat' (arguments-list)
(arguments-list) : (argument)
(arguments-list) : (arguments-list) (argument)
(argument) : (file)

The problem is I can't write down a precise grammar for some commands such as md5sum. My first attempt at that would be the following:

(md5sum-command) : 'md5sum' (arguments-list)
(arguments-list) : (argument)
(arguments-list) : (arguments-list) (argument)
(argument) : (file)
(argument) : '--check'

But as you can see this grammar allows you to specify the --check argument as many times as you wish, which is incorrect as you should use it at most one time.

How can I fix that? Also, what kind of formal grammars I should study for better treating this kind of problems?

Thanks.

+1  A: 

You could try something like:

(md5sum-command) : 'md5sum' (arguments-list)
(arguments-list) : (file-arguments) | '--check' (file-arguments)
(file-arguments) : (file) (file-arguments)

Assuming you want to be able to specify exactly one --check per command, but not depend on it being the first argument, you could use:

(md5sum-command) : 'md5sum' (arguments-list)
(arguments-list) : (file-arguments) | (file-arguments) '--check' (file-arguments)
(file-arguments) : (file) (file-arguments)

Also note, that the pipe (|) symbol is just a shortcut for an additional rule. The following is equivalent:

(md5sum-command) : 'md5sum' (arguments-list)
(arguments-list) : (file-arguments) 
(arguments-list) : (file-arguments) '--check' (file-arguments)
(file-arguments) : (file) (file-arguments)

I'd be surprised if you couldn't specify most unix commands with a context free grammar like those expressed in BNFs.

Daren Thomas
Your version seems to work, but it is not general enough. For example, you are not required to put the '--check' argument immediately after the 'md5sum' argument: you can have many files specified before and after it.
Francesco Turco
Having read your new version, I managed to write down the following:(arguments-list) : [(files-list)] [(mode-option)] [(files-list)] ['--check'] [(files-list)](arguments-list) : [(files-list)] ['--check'] [(files-list)] [(mode-option)] [(files-list)](mode-option) : '--binary'(mode-option) : '--text'I couldn't abbreviate it further. So adding other arguments such as '--quiet', '--warn' or '--status' would result in an exceedingly long grammar for such a simple command.
Francesco Turco
True. You probably want to make a difference between arguments that are just flags (can appear anywhere in argument list, act globally) and arguments that act on the files following.
Daren Thomas
A: 

I probably found an answer, although it's not the expected one. You can choose to recognise the correctness of a command instead of generate correct commands. Using some hybrid language, you can write the following set of requirements:

argument(0) == "md5sum"
forall i, if i != 0 then argument(i) == "--binary" or
                         argument(i) == "--text" or
                         argument(i) == "--check" or
                         argument(i) == "--status" or
                         argument(i) belongs to <file>
0 <= instances("--binary") + instances("--text") <= 1
0 <= instances("--check") <= 1
if instances("--check") == 1 then 0 <= instances("--status") <= 1

I won't mark this answer as the correct one because I'm still curious to know if there exists a way to generate correct commands.

Francesco Turco