ansaurus

Question

C# Regex replace url

Answer 1

+4 A:

Regex regex = new Regex(@"http://domain/ViewDocument.aspx\?id=3D(\d+)&amp;doc=(\w+)");
Match match = regex.Match(link.Href);
if (match.Success)
{
    link.Href = string.Format("javascript:loadDocument('{0}','{1}')", match.Groups[1].Value, match.Groups[2].Value);
}

Anton 2010-05-31 18:11:29

@Mark Oops, fixed.

Anton 2010-05-31 18:16:22

Thnx, but I don;t get it completely. Is link.Href my document (in string format) which contains all my links? Don't I need a loop or something when there are mutiple links?

Martijn 2010-05-31 18:19:50

Wrap the code in a `foreach` loop. The `link` is whatever your iterator variable is. If you don't have the links as an enumerable collection, run `regex.Match` against the document string itself and iterate over the matches.

Anton 2010-05-31 23:26:39

Answer 2

+2 A:

You could use Html Agility Pack to help parse the HTML. Here's how you could do it:

//Regex regex = new Regex(@"^http://domain/ViewDocument\.aspx\?id=3D(\d+)&amp;amp;doc=(\w+)$");
Regex regex = new Regex(@"^http://domain/ViewDocument\.aspx\?id=3D(\d+)&amp;doc=(\w+)$");
HtmlDocument doc = new HtmlDocument();
doc.Load("input.html");
var nodes = doc.DocumentNode
               .Descendants("a")
               .Where(node => regex.IsMatch(node.Attributes["href"].Value));

foreach (HtmlNode node in nodes)
{
    var href = node.Attributes["href"];
    href.Value = regex.Replace(href.Value, "javascript:loadDocument('$1','$2')");
    node.Attributes["target"].Remove();
}

doc.Save(Console.Out);

Result:

<a href="javascript:loadDocument('1','form')">Document naam 1</a>
<a href="javascript:loadDocument('2','form')">Document naam 2</a>
<a href="javascript:loadDocument('3','form')">Document naam 3</a>

Mark Byers 2010-05-31 18:24:52

+1, wow this looks awesome. I should learn C#.

polygenelubricants 2010-05-31 18:34:00

Answer 3

+1 A:

Polygenelubricants has pointed me in the very right way, but has removed his answer :(

He gave me this link. Thanks to him I found my solution:

string replaced = "";

string regex = "<a href=3D\"http://\\S+id=3D(\\d+)&amp;doc=3D(\\w+)\" target=3D\"_parent\">";
Regex regEx = new Regex(regex);

replaced = regEx.Replace(mhtFile, "<a href=3D\"javascript:window.parent.loadDocument('$1','$2')\">");

Response.Write(replaced);

For those who are interested, this links are inside a .mht file. That's why the 3D are placed after the = sign. The variable mhtFile contains the whole mht file in plain mht text.

Martijn 2010-06-01 08:26:41

@polygenelubricants: If you're reading this you may wish to undelete your answer so that Martijn can upvote it and/or accept it.

Mark Byers 2010-06-01 09:00:18

@Mark thank you for looking the correct user

Martijn 2010-06-01 11:38:49

ansaurus

tags:

views:

answers:

C# Regex replace url

related questions