views:

82

answers:

1

I'd like to automatically highlight and extract URLs from a QWebView or QTextEdit. I've found a lot of RegEx examples on the web which allows to do just that, but most of them seem overly complicated, don't work properly or are not compatible with Qt4's RegExp implementation.

So I'm asking here for a Qt4-specific RegExp pattern which allows for reliable URL highlighting, possibly in the context of surrounding text (documentation, chat, etc.). It should be able to highlight mailto: links, and protocols other than http://, such as https:// or ftp://.

+1  A: 

I don't have access to Qt4, so I can't test if this works. Jan Goyvaerts (read the blog post here) has written a very good article about how to match an URL in a block of text. There is a balance to be achieved between a simple but incomplete and a complicated but robust regular expression for this purpose.

As far as I can gather from the Qt4 regex documentation I found online, the following regex should work because it doesn't use any features that QRegExp doesn't have:

\b(?:(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%?=~_|$!:,.;]*[-A-Z0-9+&@#/%=~_|$]|((?:mailto:)?[A-Z0-9._%+-]+@[A-Z0-9._%-]+\.[A-Z]{2,6})\b)|"(?:(?:https?|ftp|file)://|www\.|ftp\.)[^"\r\n]+"|'(?:(?:https?|ftp|file)://|www\.|ftp\.)[^'\r\n]+'

But it sure isn't simple.

The same regex in free-spacing mode:

\b(?:(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%?=~_|$!:,.;]*[-A-Z0-9+&@#/%=~_|$]
   | ((?:mailto:)?[A-Z0-9._%+-]+@[A-Z0-9._%-]+\.[A-Z]{2,6})\b)
|"(?:(?:https?|ftp|file)://|www\.|ftp\.)[^"\r\n]+"
|'(?:(?:https?|ftp|file)://|www\.|ftp\.)[^'\r\n]+'

is a bit easier to read.

Tim Pietzcker
Thanks. I'll have to check that as soon as I get home. :)
BastiBense
It seems that `\b` is not supported by QRegExp. Also this I could only get this RegExp to match an URL which is surrounded by quotes.
BastiBense