tags:

views:

250

answers:

3

I have a bunch of links in a document which has to be replaced by a javascript call. All the links looks the same:

<a href="http://domain/ViewDocument.aspx?id=3D1&amp;doc=form" target="_blank">Document naam 1</a>
<a href="http://domain/ViewDocument.aspx?id=3D2&amp;doc=form" target="_blank">Document naam 2</a>
<a href="http://domain/ViewDocument.aspx?id=3D3&amp;doc=form" target="_blank">Document naam 3</a>

Now I want all this links to be replaced to:

<a href="javascript:loadDocument('1','form')">Document naam 1</a>
<a href="javascript:loadDocument('2','form')">Document naam 2</a>
<a href="javascript:loadDocument('3','form')">Document naam 3</a>

So the Id=3D in the url is the first parameter in the function and the doc parameter is the second parameter in the function call.

I want to do this using Regex because I think this is the quickest way. But the problem is my regex knowledge is too limited

+4  A: 
Regex regex = new Regex(@"http://domain/ViewDocument.aspx\?id=3D(\d+)&amp;doc=(\w+)");
Match match = regex.Match(link.Href);
if (match.Success)
{
    link.Href = string.Format("javascript:loadDocument('{0}','{1}')", match.Groups[1].Value, match.Groups[2].Value);
}
Anton
@Mark Oops, fixed.
Anton
Thnx, but I don;t get it completely. Is link.Href my document (in string format) which contains all my links? Don't I need a loop or something when there are mutiple links?
Martijn
Wrap the code in a `foreach` loop. The `link` is whatever your iterator variable is. If you don't have the links as an enumerable collection, run `regex.Match` against the document string itself and iterate over the matches.
Anton
+2  A: 

You could use Html Agility Pack to help parse the HTML. Here's how you could do it:

//Regex regex = new Regex(@"^http://domain/ViewDocument\.aspx\?id=3D(\d+)&amp;amp;doc=(\w+)$");
Regex regex = new Regex(@"^http://domain/ViewDocument\.aspx\?id=3D(\d+)&amp;doc=(\w+)$");
HtmlDocument doc = new HtmlDocument();
doc.Load("input.html");
var nodes = doc.DocumentNode
               .Descendants("a")
               .Where(node => regex.IsMatch(node.Attributes["href"].Value));

foreach (HtmlNode node in nodes)
{
    var href = node.Attributes["href"];
    href.Value = regex.Replace(href.Value, "javascript:loadDocument('$1','$2')");
    node.Attributes["target"].Remove();
}

doc.Save(Console.Out);

Result:

<a href="javascript:loadDocument('1','form')">Document naam 1</a>
<a href="javascript:loadDocument('2','form')">Document naam 2</a>
<a href="javascript:loadDocument('3','form')">Document naam 3</a>
Mark Byers
+1, wow this looks awesome. I should learn C#.
polygenelubricants
+1  A: 

Polygenelubricants has pointed me in the very right way, but has removed his answer :(

He gave me this link. Thanks to him I found my solution:

string replaced = "";

string regex = "<a href=3D\"http://\\S+id=3D(\\d+)&amp;doc=3D(\\w+)\" target=3D\"_parent\">";
Regex regEx = new Regex(regex);

replaced = regEx.Replace(mhtFile, "<a href=3D\"javascript:window.parent.loadDocument('$1','$2')\">");

Response.Write(replaced);

For those who are interested, this links are inside a .mht file. That's why the 3D are placed after the = sign. The variable mhtFile contains the whole mht file in plain mht text.

Martijn
@polygenelubricants: If you're reading this you may wish to undelete your answer so that Martijn can upvote it and/or accept it.
Mark Byers
@Mark thank you for looking the correct user
Martijn