ansaurus

Question

how to write a regex for this expression?

Answer 1

+4 A:

Why not just use urlparse instead?

Amber 2010-09-14 07:30:45

Agreed; regular expressions aren't good for URIs, email addresses or markup.

Delan Azabani 2010-09-14 07:32:26

@Delan: I'm pretty sure using regular expressions for URIs is totally fine. They even give you one to parse an URI in RFC 3986.

Felix Kling 2010-09-14 07:36:13

Though most URIs are simple, there are some quirks and complexities, just like with email addresses, that makes some false positives and negatives. I can't remember who, but someone wrote a regular expression that validates email addresses exactly to the spec as a proof of this concept, and it filled over a page.

Delan Azabani 2010-09-14 07:38:18

@Delan: True, but nevertheless, I am sure that under the hood, `urlparse` also uses a regular expression. It might be complex, but that does not necessarily mean it is bad. Of course you don't want to write such an expression every time on your own ;) I wrote an URI parser once that should validate against the RFC and it was not too complex (it used several regular expressions, not just one, that might be indeed too complex).

Felix Kling 2010-09-14 07:41:32

@Felix King, there's no need to guess about these things, just have a look at urlparse.py and you'll see there is not a single regular expression there: urlparse.py doesn't `import re`, in fact it doesn't import anything. What there is is a lot of complex domain knowledge as to what features the different schemes support.

Duncan 2010-09-14 07:51:19

Answer 2

A:

http://\w+\.\w+//?\w+

splash 2010-09-14 07:30:57

Answer 3

A:

The answer depends on whether you want to parse urls in general or whether you just wonder how to handle the optional slash.

In the first case, I agree with Amber that you should use urlparse.

In the second case, use a ? after the slash in your expression:

http://xyz.com//?abc

A ? in a regular expression means that the previous element is optional (i.e. may appear zero times or once).

Jonas Wagner 2010-09-14 08:00:47

ansaurus

tags:

views:

answers:

how to write a regex for this expression?

related questions