views:

77

answers:

2

I need to use C# Regex to do a link rewrite for html pages and I need to replace the links enclosed with quotes (") with my own ones. Say for example, I need to replace the following

"slashdot.org/index.rss"

into

"MY_OWN_LINK"

However, the actual link can be of the form

"//slashdot.org/index.rss" or
"/slashdot.org/index.rss"

where there can be other values that comes before "slashdot.org/index.rss" but after the quote (") which I don't care about.

To summarize, as long as the link ends with "slashdot.org/index.rss", I would want to replace the entire link with "MY_OWN_LINK".

How can I use Regex.Replace for the above?

+1  A: 

Try this. Will work with no slash, single and two slashes.

    string pattern =  @"[/]{0,2}slashdot\.org[/]{0,2}index\.rss";
    test1 = Regex.Replace(test1, pattern, "MY_OWN_LINK");
Fadrian Sudaman
What is the pattern for "[Anystring]slashdot.org/index.rss" where [Anystring] is to match strings of any length and can be of any value that comes before "slashdot.org/index.rss"?
Lopper
You can do @".*slashdot\.org[/]{0,2}index\.rss" so the .* matches anything. This can be dangerous, so use it carefully. You can optionally stick ^ at the beginning and $ at the end to match a full string like @"^.*slashdot\.org[/]{0,2}index\.rss$"
Fadrian Sudaman
A: 

edit: updated answer according to comment.

First, you don't have to use a regular expression for this job. Just check whether or not the string ends with `"slashdot.org/index.rss"', and if it is, replace the entire string.

If you're using regular expression, you'd better just test whether or not the string ends with "slashdot.org/index.rss" and act accordingly, like so:

if (Regex.IsMatch(str,"slashdot.org/index\.rss$")) {str = new_str;}

If you insist of using Regex.Replace, go for

Regex.Replace(str,"^.*slashdot.org/index\.rss$","MY_OWN_LINK");

where the ^ and the $ stands for line/string begin/end respectively. The first .* means "capture the start of the URL, whatever it is". The last dot is perpended with slash, as it usually means "any character".

For additional info, see this cheat sheet of regular expression in C#.

Elazar Leibovich
What I want is to replace the entire link with "MY_OWN_LINK" so long as it ends with "slashdot.org/index.rss" and not just replace part of it.
Lopper
So even better, test if the link EndsWith("slashdot.org/index.rss") and if it is, replace the entire string. http://msdn.microsoft.com/en-us/library/system.string.endswith(VS.71).aspx
Elazar Leibovich
BTW, where can I find the msdn page which defines what each of the element found in a regular expression string (like '.', '[]' etc) does?
Lopper
http://msdn.microsoft.com/en-us/library/28hw3sce(VS.80).aspx I googled "regular expression MSDN". This was the third link.
Elazar Leibovich