tags:

views:

4567

answers:

7

I have a URL like this:

http://192.168.0.1:8080/servlet/rece

I want to parse the URL to get the values:

IP: 192.168.0.1
Port: 8080
page:  /servlet/rece

How do I do that?

+6  A: 

Write a custom parser or use one of the string replace functions to replace the separator ':' and then use sscanf().

dirkgently
There are many traps to watch so a custom parser seems to me a bad idea.
bortzmeyer
@bortzmeye: that doesn't make the suggestion invalid. It's vague reasoning. Also, a custom parser is the most powerful/efficient/dependency free. The sscanf is easier to get wrong.
dirkgently
+6  A: 

With a regular expression if you want the easy way. Otherwise use FLEX/BISON.

You could also use a URI parsing library

dsm
Indeed, using a library seems the only reasonable thing, since there are many traps (http vs. https, explicit port, encoding in the path, etc).
bortzmeyer
+1  A: 

I writed a simple code use sscanf. I want have a base way to parsing it.

cat urlparse.c
#include <stdio.h>

int main(void)
{
    const char text[] = "http://192.168.0.2:8888/servlet/rece";
    char ip[100];
    int port = 80;
    char page[100];
    sscanf(text, "http://%99[^:]:%99d/%99[^\n]", ip, &port, page);
    printf("ip = \"%s\"\n", ip);
    printf("port = \"%d\"\n", port);
    printf("page = \"%s\"\n", page);
    return 0;
}

./urlparse
ip = "192.168.0.2"
port = "8888"
page = "servlet/rece"
BianJiang
What platform is this on? I did not know you could put regexp like [^:] in a sscanf format.
James Dean
My platform is: uname -aLinux ubuntu 2.6.24-21-generic #1 SMP Tue Oct 21 23:43:45 UTC 2008 i686 GNU/Linux
BianJiang
[^:] is not a regexp in this context, it's merely a special format specifier for sscanf(). It is standard. See for instance this manual page: <http://linux.die.net/man/3/sscanf>.
unwind
The parse had some mistakes when no port number, It con't work well. How can i fix it.
BianJiang
+5  A: 

Personnally, I steal the HTParse.c module from the W3C (it is used in the lynx Web browser, for instance). Then, you can do things like:

 strncpy(hostname, HTParse(url, "", PARSE_HOST), size)

The important thing about using a well-established and debugged library is that you do not fall into the typical traps of URL parsing (many regexps fail when the host is an IP address, for instance, specially an IPv6 one).

bortzmeyer
+1  A: 

If you're using the CLR, you may want to consider using the System.Uri class. I don't know C, so here's an example in C#:

using System;

class Program
{
    static void Main(string[] args)
    {
        string url = "http://192.168.0.1:8080/servlet/rece";

        Console.WriteLine("Original URL: {0}", url);
        Console.WriteLine();

        Uri uri = new Uri(url);

        Console.WriteLine("Server: {0}", uri.Host);
        Console.WriteLine("Port: {0}", uri.Port);
        Console.WriteLine("Page: {0}", uri.PathAndQuery);

        Console.ReadLine();
    }
}

It produces this output:

Original URL: http://192.168.0.1:8080/servlet/rece

Server: 192.168.0.1

Port: 8080

Page: /servlet/rece

Aaron Daniels
My platform is working on mips linux, It's a Embedded system. only can use C programming.
BianJiang
+1  A: 

If you're looking for a standard-compliant, performant way for parsing a URL or URI you can have a look at some code from a PHP class provided here:

http://andreas-hahn.com/en/parse-url

It is able to parse URLs as well as parse URIs, URNs and even IRIs according to RFC 3986 and RFC 3987.

Andreas M. Hahn
A: 

gando hakumat in ki ma ko lun do dollar wala

hakumatkabap
could you repeat that in English?
fARcRY