ansaurus

Question

Advanced Regex: Smart auto detect and replace URLs with anchor tags

Answer 1

+1 A:

You would have to use the Regex.Replace overload that uses a MatchEvaluator, a delegate that constructs the replacement text for you.

See here: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchevaluator.aspx

Technically, it is possible with just regexes, by doing what Kobi suggests. I'm not sure I'd want to ask anybody (including yourself after a few months) to maintain that regex however.

Thorarin 2010-05-05 06:35:33

Answer 2

+1 A:

You can split ${url} to two capturing groups - urlhead, with the number of characters you want to display, and urltail with the rest. Here's an example with 10 characters; this is somewhat simplfied to remove the condition, the last (?<ending>(?(outer)(?=\)))) should take care of that - it backtracks and captures the last ) when needed:

(?<outer>(?<=\())?
(?<scheme>http(?<secure>s)?://)?
(?<url>
    (?(scheme)
        (?:www\.)?
        |
        www\.
    )
    [a-z0-9]
    [-a-z0-9/+&@#/%?=~_()|!:,.;čšžćđ]{1,10}
)
(?<urltail>[-a-z0-9/+&@#/%?=~_()|!:,.;čšžćđ]+)
(?<ending>(?(outer)(?=\))))

Note that I've also changes outer and ending to be lookarounds, so they are not captured and replaced. The replace string in this case looks like:

<a href=\"http${secure}://${url}${urltail}\">http${secure}://${url}</a>

Kobi 2010-05-05 06:37:20

That would work, but imo it's obfuscated enough as it is :)

Thorarin 2010-05-05 06:38:57

Actually, looking at it again, it isn't so scary; you just have to add the `?(outer)` part again. Looks like it's well documented, too.

Kobi 2010-05-05 07:05:19

@Kobi: You're right, but regex would probably become really complicated because it would have lots of conditions. It should however capture all: the shorter ones, the longer ones, each enclosed in braces or not etc. I will think about it, but I'm not really sure if that's an optimal solution.

Robert Koritnik 2010-05-05 07:18:07

@Kobi: What do you mean by "you just have to add the `?(outer)` part again"? Where to?

Robert Koritnik 2010-05-05 07:24:20

@Robert - I'm having a little trouble with edge cases so I can't post the full solution, but the idea is to limit `?<url>` to a fixed maximal number of characters. A simplified example: instead of `(.+)`, use `(.{1,10})(.*)` - this will allow an easy replace, showing only the first 10 characters.

Kobi 2010-05-05 07:50:21

@Kobi: I think a better way would be to use a lookahead before capturing the "url". A lookahead could determine url length and its result could be used in an if statement that would either capture url or url + reminder. Maybe that would be a way to go.

Robert Koritnik 2010-05-05 14:18:00

@Robert - Very possibly, but then, how will you build the full url in the link? Does a lookahead capture groups?

Kobi 2010-05-05 14:46:51

link would always consist of both parts even though there would be no reminder, but if it was, everything would be fine. link display would always display just url.

Robert Koritnik 2010-05-05 19:49:56

@Kobi: Lookarounds don't capture anything. They're just used to look around and give you some info for capturing.

Robert Koritnik 2010-05-06 07:13:19

@Robert - see the updates, this is handled with a single `replace`, ans simplified your regex.

Kobi 2010-05-08 16:09:30

Hat's down @Kobi. This is actually working as it should! Good work. I haven't tried yet, so this will come in handy when I'll actually use it. Thanks a bunch. I upvoted you as well as accepted your answer since you delved so deeply into it. But I'm sure this common effort of ours will benefit someone else as well. ;)

Robert Koritnik 2010-05-08 21:57:43

ansaurus

tags:

views:

answers:

Advanced Regex: Smart auto detect and replace URLs with anchor tags

Question

related questions