ansaurus

Question

Answer 1

+1 A:

You should use parse_url to split the URL into its components. And when having the URL path, you can use explode to split the path into its segments, array_slice to get specific segments and pathinfo to get the extension.

Gumbo 2010-07-24 12:51:52

Indeed, with the possible addition of an `explode('/',$pathstring)` to easily get to the right path-segments.

Wrikken 2010-07-24 12:54:23

Would this be less resource intensive than regular expressions?

Rob 2010-07-24 13:11:23

@Rob: I don’t have any information about that. But it is probably more comprehensive, faultless and flexible than using regular expression.

Gumbo 2010-07-24 13:25:29

Answer 2

A:

PHP has the parse_url function.

This method highly recommended, especially as opposed to using Regular Expressions.

injekt 2010-07-24 12:53:25

Answer 3

A:

The expression below is, hopefully programming language agnostic.

^.*?\\.[^/]+/[^/]+/([^/]+)/([^/]+/[^/]+/[^/]+)/.*(\\d+)\\.(\\w+).*$

Let me explain what this does.

I consume the whole line (anchored by ^ and $) and work initially toward the last '.' character in the domain. From there I consume the last element of the domain and the first path element together with the '/' separator characters that follow each element, then I use capturing groups to grab the language field and the next three element segment of the path then discard up to the start of the filename and use two more groups to capture the file name and the extension discarding whitespace, if any to the end of the line.

A word of caution, I have done minimal testing of the exprssion above but believe that it can handle most URLs composed of characters in the ASCII range. It is also very specific to the structure of the URL and won't handle URLs on more than one line.

Don Mackenzie 2010-07-24 17:44:12

ansaurus

tags:

views:

answers:

Regular Expressions - Parsing a url

related questions