views:

241

answers:

3

I have a daemon that reads a configuration file in order to know where to write something. In the configuration file, a line like this exists:

output = /tmp/foo/%d/%s/output

Or, it may look like this:

output = /tmp/foo/%s/output/%d

... or simply like this:

output = /tmp/foo/%s/output

... or finally:

output = /tmp/output

I have that line as cfg->pathfmt within my program. What I am trying to do now is to come up with some clever way of using it.

A little more explanation, the path can contain up to two components to be formatted. %d will be expanded as a job ID (int), %s as a job name (string). The user may want to use one, both or none in the configuration file. I need to know what they want and in what order before I finally pass it to snprintf(). I can kind of narrow it down, but I keep wanting to talk to strtok() and that seems ugly.

I want to give users this kind of flexibility, however I'm getting lost looking for a sensible, portable way to implement it. I'm also at a complete and total loss for how to begin searching for this.

I'd be very happy if:

  • Someone could help me narrow down the search phrase to find good examples
  • Someone could post a link to some OSS project implementing this
  • Someone could post some psuedo code

I don't want the code written for me, I'm just really stuck on what (I think) should be something very simple and need some help taking the first bite. I really feel like I'm over thinking and overlooking the obvious.

The end result should be a boolean function like this:

bool output_sugar(const char *fmt, int jobid, const char *jobname, struct job *j);

It would then call snprintf() (sensibly) on j->outpath, returning false if some kind of garbage (i.e. % followed by something not s, d or %) is in the config line (or its null). The sanity checks are easy, I'm just having a bit of a time getting the number (and order) of arguments to format correct.

Thanks in advance. Also, feel free to edit this title if you have the reputation to do so, as I said, I'm not quite sure how to ask the question in a single line.

+4  A: 

Using strtok is a error prone. You can treat your variables as a mini language using (fl)lex and yacc. There is simple tutorial here

%{
#include <stdio.h>
%}

%%
%d                      printf("%04d",jobid);
%s                      printf("%s",stripspaces(dirname));
%%

I made an ODBC wrapper that would let you do stuff like dbprintf("insert into blah values %s %D %T %Y", stuff here...); But it was many years ago and I bit it and parsed the format string using strtok.

ojblass
I am really, really hoping to not bring in a lexical parser to deal with just one string :)
Tim Post
Well I know AFTER I left that job they got fed up with my strtok and replaced it with the above. John Levine hates handwritten parsers for a reason.
ojblass
Yeah, and if the string ever has to expand, ouch. Ok .. off to a parser I go. Thank you for the link :) At the least, I can justify using one to make the configuration file smarter (sort of like BIND).
Tim Post
A: 

If the number of options is small and you don't otherwise want/need the extra flexibility and complexity of a parser, you could simply search for each potential replacement substring using strstr().

If you have only the two options, you could tolerably create a four-branched if/else structure (only A, only B, both with A before B, both with B before A) in which to call sprintf() with the correctly ordered arguments. Otherwise, make multiple sprintf() calls, each of which replaces only the first replacement-marker in the format string. (This implies building a list of which replacements are needed and sorting them in appearance-order...)

Jeff Shannon
Why is everyone in love with creating and parsing their own strings?
ojblass
+4  A: 

Yes, you need a parser of some sort. It need not be complex, though:

void format_filename(const char *fmt, int jobid, const char *jobname,
                     char *buffer, size_t buflen)
{
    char *end = buffer + buflen - 1;
    const char *src = fmt;
    char *dst = buffer;
    char c;
    assert(buffer != 0 && fmt != 0 && buflen != 0 && jobname != 0);
    while ((c = *src++) != '\0')
    {
        if (dst >= end)
            err_exit("buffer overflow in %s(): format = %s\n",
                     __func__, fmt);
        else if (c != '%')
            *dst++ = c;
        else if ((c = *src++) == '\0' || c == '%')
        {
            *dst++ = '%';
            if (c == '\0')
                break;
        }
        else if (c == 's')
        {
            size_t len = strlen(jobname);
            if (len > end - dst)
                err_exit("buffer overflow on jobname in %s(): format = %s\n",
                         __func__, fmt);
            else
            {
                strcpy(dst, jobname);
                dst += len;
            }
        }
        else if (c == 'd')
        {
             int nchars = snprintf(dst, end - dst, "%d", jobid);
             if (nchars < 0 || nchars >= end - dst)
                 err_exit("format error on jobid in %s(); format = %s\n",
                          __func__, fmt);
             dst += nchars;
        }
        else
            err_exit("invalid format character %d in %s(): format = %s\n",
                     c, __func__, fmt);
    }
    *dst = '\0';
}

Now tested code. Note that it supports the '%%' notation to allow the user to embed a single '%' in the output. Also, it treats a single '%' at the end of the string as valid and equivalent to '%%'. It calls err_exit() on error; you can choose alternative error strategies as suits your system. I simply assume you have included <assert.h>, <stdio.h> and <string.h> and the header for the err_exit() (variadic) function.


Test code...

#include <stdio.h>
#include <string.h>
#include <stdarg.h>
#include <assert.h>

static void err_exit(const char *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    exit(1);
}

... then format_filename() as above, then ...

#define DIM(x) (sizeof(x)/sizeof(*(x)))

static const char *format[] =
{
    "/tmp/%d/name/%s",
    "/tmp/%s/number/%d",
    "/tmp/%s.%d%%",
    "/tmp/%",
};

int main(void)
{
    char buffer[64];
    size_t i;

    for (i = 0; i < DIM(format); i++)
    {
        format_filename(format[i], 1234, "job-name", buffer, sizeof(buffer));
        printf("fmt = %-20s; name = %s\n", format[i], buffer);
    }

    return(0);
}
Jonathan Leffler
Jonathan, thanks, you rather spoiled me with this one :) My pointer arithmetic is not the best, this was a very good example to build my own function. Changed this to be the accepted answer, as it is the most informative.
Tim Post
Alright now supposing you want to add %U for a username... congratulations you have just made me visit code like *dst++ = '%'; and I will curse you until the end of days.
ojblass
The price of flexibility and using a home-brew notation is that you have to make changes to code to support changes in notation. Since there is nothing that supports what the requestor requested 'out of the box' (that I know of), a custom and hence somewhat ad hoc solution is inevitable.
Jonathan Leffler
I disagree but acknowledge that I do not know everything.
ojblass
For this particular use, the number of tokens in the string (and type) will likely never change. It doesn't have to scale. The idea of bringing in a parser just to handle ONE string just seemed like shooting a fish, in a barrel, with an elephant gun that had a laser sight .. at point blank range.
Tim Post