ansaurus

Question

Answer 1

+1 A:

var myregexp = /^(?:[^\/]*\/){4}([^\/]+)/;
var match = myregexp.exec(subject);
if (match != null) {
    result = match[1];
} else {
    result = "";
}

matches whatever lies between the fourth and fifth slash and stores the result in the variable result.

Tim Pietzcker 2010-04-22 20:58:33

cute... I was thinking that, but I didn't write it as an answer

xyld 2010-04-22 22:23:03

Reading from left side I am just looking for whatever text between 4th and 5th slash (/).

2010-04-23 05:25:49

Ah, you beat me on the update! Amazing how far a little clarification of requirements goes :)

BenV 2010-04-23 14:22:32

Answer 2

+1 A:

What parts of the URL could vary and what parts are constant? The following regex will always match whatever is in the slashes following "/en/" - the-game in your example.

(?<=/en/).*?(?=/)

This one will match the contents of the 2nd set of slashes of any URL containing "webdev", assuming the first set of slashes contains a 2 or 3 character language code.

(?<=.*?webdev.*?/.{2,3}/).*?(?=/)

Hopefully you can tweak these examples to accomplish what you're looking for.

BenV 2010-04-22 22:01:54

Reading from left side I am just looking for whatever text between 4th and 5th slash (/).

2010-04-23 05:24:44

Answer 3

A:

You probably should use some kind of url parsing library rather than resorting to using regex.

In python:

from urlparse import urlparse
url = urlparse('http://www.somesite.com.domain.webdev.domain.com/en/the-game/another-one/another-one/another-one/')
print url.path

Which would yield:

/en/the-game/another-one/another-one/another-one/

From there, you can do simple things like stripping /en/ from the beginning of the path. Otherwise, you're bound to do something wrong with a regular expression. Don't reinvent the wheel!

xyld 2010-04-22 22:27:54

ansaurus

tags:

views:

answers:

Regex: Getting content from URL

related questions