tags:

views:

122

answers:

1

Hello all, how can i use Regex Expressions in C programming? for example if i want to find a line in a file

DAEMONS=(sysklogd network sshd !netfs !crond)

then print each daemon in separate line like this

sysklogd 
network 
sshd 
!netfs 
!crond

here what i did so far

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>
#define tofind    "[a-z A-Z] $"
int main(){
 FILE *fp;
 char line[1024];
 int retval = 0;
 char address[256];
 regex_t re;

 if(regcomp(&re, tofind, REG_EXTENDED) != 0)
  return;

 fp = fopen("/etc/rc.conf","r");//this file has this line "DAEMONS=(sysklogd network sshd !netfs !crond)"
 while((fgets(line, 1024, fp)) != NULL) {
     if((retval = regexec(&re, address, 0, NULL, 0)) == 0)
      printf("%s\n", address);
 } 
}

Any help would be much appreciated.

+3  A: 

You read the line into line, so you should pass line to regexec(). You also need to think about whether the newline at the end of the line affects the patterns. (It was correct to use fgets(), but remember it keeps the newline at the end.)

You should also do return -1; (or any other value that is not 0 modulo 256) rather than a plain return with no value. Also, you should check that the file was opened; I had to use an alternative name because there is no such file as /etc/rc.conf on my machine - MacOS X.

This works for me:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <regex.h>

#define tofind    "[a-z A-Z] $"

int main(int argc, char **argv)
{
    FILE *fp;
    char line[1024];
    int retval = 0;
    regex_t re;
    //this file has this line "DAEMONS=(sysklogd network sshd !netfs !crond)"
    const char *filename = "/etc/rc.conf";

    if (argc > 1)
        filename = argv[1];

    if (regcomp(&re, tofind, REG_EXTENDED) != 0)
    {
        fprintf(stderr, "Failed to compile regex '%s'\n", tofind);
        return EXIT_FAILURE;
    }

    fp = fopen(filename, "r");
    if (fp == 0)
    {
        fprintf(stderr, "Failed to open file %s (%d: %s)\n",
                filename, errno, strerror(errno));
        return EXIT_FAILURE;
    }

    while ((fgets(line, 1024, fp)) != NULL)
    {
        line[strlen(line)-1] = '\0';
        if ((retval = regexec(&re, line, 0, NULL, 0)) == 0)
            printf("<<%s>>\n", line);
    } 
    return EXIT_SUCCESS;
}

If you need help writing regular expressions instead of help writing C code that uses them, then we need to design the regex to match the line you show.

^DAEMONS=([^)]*) *$

This will match the line as long as it is written as shown. If you can have spaces between the 'S' and the '=' or between the '=' and the '(', then you need appropriate modifications. I've allowed for trailing blanks - people are often sloppy; but if they use trailing tabs, then the line won't be selected.

Once you've found the line, you have to split it into pieces. You might elect to use the 'capturing' brackets facility, or simply use strchr() to find the open bracket, and then a suitable technique for separating the daemon names - I'd avoid strtok() and probably use strspn() or strcspn() to find the words.


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <regex.h>

#define tofind    "^DAEMONS=\\(([^)]*)\\)[ \t]*$"

int main(int argc, char **argv)
{
    FILE *fp;
    char line[1024];
    int retval = 0;
    regex_t re;
    regmatch_t rm[2];
    //this file has this line "DAEMONS=(sysklogd network sshd !netfs !crond)"
    const char *filename = "/etc/rc.conf";

    if (argc > 1)
        filename = argv[1];

    if (regcomp(&re, tofind, REG_EXTENDED) != 0)
    {
        fprintf(stderr, "Failed to compile regex '%s'\n", tofind);
        return EXIT_FAILURE;
    }

    fp = fopen(filename, "r");
    if (fp == 0)
    {
        fprintf(stderr, "Failed to open file %s (%d: %s)\n", filename, errno, strerror(errno));
        return EXIT_FAILURE;
    }

    while ((fgets(line, 1024, fp)) != NULL)
    {
        line[strlen(line)-1] = '\0';
        if ((retval = regexec(&re, line, 2, rm, 0)) == 0)
        {
            printf("<<%s>>\n", line);
            printf("Line: <<%.*s>>\n", (int)(rm[0].rm_eo - rm[0].rm_so), line + rm[0].rm_so);
            printf("Text: <<%.*s>>\n", (int)(rm[1].rm_eo - rm[1].rm_so), line + rm[1].rm_so);
            char *src = line + rm[1].rm_so;
            char *end = line + rm[1].rm_eo;
            while (src < end)
            {
                size_t len = strcspn(src, " ");
                if (src + len > end)
                    len = end - src;
                printf("Name: <<%.*s>>\n", (int)len, src);
                src += len;
                src += strspn(src, " ");
            }
        }
    }
    return EXIT_SUCCESS;
}

A good deal of debugging code in there - but it won't take you long to produce the answer you request. I get:

<<DAEMONS=(sysklogd network sshd !netfs !crond)>>
Line: <<DAEMONS=(sysklogd network sshd !netfs !crond)>>
Text: <<sysklogd network sshd !netfs !crond>>
Name: <<sysklogd>>
Name: <<network>>
Name: <<sshd>>
Name: <<!netfs>>
Name: <<!crond>>

Beware: when you want a backslash in a regex, you have to write two backslashes in the C source code.

Jonathan Leffler
Just wanted to add that the regular expression `toFind` is flawed. Nothing will match anyway.
Jeff M
@Jeff: are you sure? As shown, the RE finds either a single alphabetic followed by a blank and end-of-line or two blanks and end of line. Whether that is sensible for parsing /etc/rc.conf is one issue - however, the RE finds what it says it wants to find.
Jonathan Leffler
Still, I am not getting any output?the /etc/rc.conf content is <code>bla bla blaDAEMONS=(sysklogd network sshd !netfs !crond)bla bla bla</code>
Face
@Face: OK - then the regex is not what you want. You need to show in the question an example line (indented as code) and under it, the section you want the regex to find for you. As I've explained in a previous comment, the regex you have entered looks for an alphabetic character or space followed by a space at the end of the line. If that isn't what you are looking for, then explain what you are. That shapes the actual regex you need.
Jonathan Leffler
@Jonatha: sorry if i was not clear enough. I am trying to separate each word between "DAEMONS=(.....)" and print each word in new line.
Face
@Jonatha: thanks alot, i really appreciate the explanation.
Face