ansaurus

Question

Answer 1

+2 A:

You can use \p{L} to match any 'letter', which will support all letters in all languages, as suggested in this SO question.

Or, you can simply replace \w* with [^<]*, to match all characters that are not the opening of an HTML tag.

But as said by others, parsing HTML using regex is a first step towards insanity...

Wookai 2009-11-23 21:41:40

Answer 2

+1 A:

Firstly: DON'T USE REGULAR EXPRESSIONS TO PARSE HTML. USE AN HTML PARSER.

Secondly: if you really want to do this (and you don't) then instead of \w you could match any character apart from '<':

<a href="/userinfo/userinfo\.aspx\?ID=\d*" target="helgonmain">[^<]*</a> \w\d\d

Mark Byers 2009-11-23 21:42:28

Answer 3

+1 A:

You can use a character class which specifically includes those things:

[\wåäöÅÄÖ]*

Or you can use the Unicode character class for letters:

\p{L}

or specifically for Latin:

\p{InBasicLatin}

Joey 2009-11-23 21:42:31

C# Regex - How to parse string for Swedish letters åäöÅÄÖ?