C doesn't have built-in regular expressions, though libraries are available: http://www.arglist.com/regex/, http://www.pcre.org/ are the two I see most often.
For a task this simple, you can easily get away without using regexes though. Provided the lines are all less than some maximum length MAXLEN
, just process them one line at a time:
char buf[MAXLEN];
char url[MAXLEN];
char host[MAXLEN];
int state = 0; /* 0: Haven't seen GET yet; 1: haven't seen Host yet */
FILE *f = fopen("my_input_file", "rb");
if (!f) {
report_error_somehow();
}
while (fgets(buf, sizeof buf, f)) {
/* Strip trailing \r and \n */
int len = strlen(buf);
if (len >= 2 && buf[len - 1] == '\n' && buf[len - 2] == '\r') {
buf[len - 2] = 0;
} else {
if (feof(f)) {
/* Last line was not \r\n-terminated: probably OK to ignore */
} else {
/* Either the line was too long, or ends with \n but not \r\n. */
report_error_somehow();
}
}
if (state == 0 && !memcmp(buf, "GET ", 4)) {
strcpy(url, buf + 4); /* We know url[] is big enough */
++state;
} else if (state == 1 && !memcmp(buf, "Host: ", 6)) {
strcpy(host, buf + 6); /* We know host[] is big enough */
break;
}
}
fclose(f);
This solution doesn't require buffering the entire file in memory as KennyTM's answer does (though that is fine by the way if you know the files are small). Notice that we use fgets()
instead of the unsafe gets()
, which is prone to overflow buffers on long lines.