views:

634

answers:

5

I'm doing a sitemap producer in Object Pascal and need a good function or lib to emulate the parse_url function on PHP.

Does anyone know of any good ones?

A: 

The URI RFC lists this regular expression for URI parsing:

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
   12            3  4          5       6  7        8 9

Where the numbers are these groups:

  $1 = http:
  $2 = http
  $3 = //www.ics.uci.edu
  $4 = www.ics.uci.edu
  $5 = /pub/ietf/uri/
  $6 = <undefined>
  $7 = <undefined>
  $8 = #Related
  $9 = Related

For this URI:

  http://www.ics.uci.edu/pub/ietf/uri/#Related

The regular expression is pretty simple and uses no special features the regular expression lib has to provide, so grab one that is compatible with your pascal implementation and there you go.

Armin Ronacher
does this accept partial URL's and ones that include login/password ?
Gustavo Carreno
+2  A: 

I am not familiar with the parse_url function on PHP, but you might try the TIdURI class that is included with Indy (which in turn is included with most recent Delphi releases). I think they ported it to FreePascal as well.

TIdURI is a TObject descendant that encapsulates a Universal Resource Identifier, as described in the Internet Standards document:

RFC 1630 - Universal Resource Identifiers in WWW

TIdURI provides methods and properties for assembly and disassembly of URIs using the component parts that make up the URI, including: Protocol, Host, Port, Path, Document, and Bookmark.

If that does not work, please give a specific example of what you are trying to accomplish - what are you trying to parse out of a URL.

Jim McKeeth
I've accepted Loesje's answer because I found a that FreePascal's uriparser Unit has a ResolveRelativeUri which is more to what I was looking for.I did have a look at the TidURI Unit and I quite liked it. I didn't look into enough to find something like the ResolveRelativeUri() of FreePascal.
Gustavo Carreno
A: 

If you're using wininet.dll you can also use their InternetCrackUrl API.

TOndrej
+2  A: 

Freepascal has the unit URIParser with the ParseURI function. An example how to use it can be found in one of the example in Freepascal's source. Or an old example which is somewhat easier to understand.

Loesje
+1  A: 

Be careful with Indy's TIdURI class. It was supposed to be a general-purpose parser, but it has a few bugs and design flaws in it that prevent it from being a fully compliant parser. I'm currently in the process of writing a new class from scratch for Indy 11 to replace TIdURI. It will be a fully compliant URI parser, and it will also suppor IRI (RFC 3987) parsing as well.

That sounds pretty good, have you a specific link for that or should I wait for Indy 11 ?
Gustavo Carreno