views:

66

answers:

4

Hi

I am trying to stop XSS attack so I am using html agility pack to make my whitelist and Microsoft Anti-Cross Site Scripting Library to deal with the rest.

Now I am looking at encoding all html hrefs. I get a big string of html code that can contain hrefs. Accours to MS Library they have an URL encode but if you encode the whole URl then it can't be used. So in the example they just encode the query string

UrlEncode Untrusted input is used in a URL (such as a value in a querystring) Click Here!

http://msdn.microsoft.com/en-us/library/aa973813.aspx

So now my questions is how do I parse through a href and find the query string. Is it always just "?" then query string or can it have spaces and be written in different ways?

Edit

This urls will not be written by me but the users who will share them. So that's why I need a way to make sure I get all query strings and not just ones in valid format. If it can work invalid format I have to grab these ones too. Hackers won't care if it is valid format or not as long as it still does what they want.

+3  A: 

I believe it is always the part after the ? but you can easily use the Uri class for this:

Uri uri = new Uri("http://foo.com/page.html?query");
string query = uri.Query;

That will include the ? itself. Of course, you can fetch the other bits as well, which could be handy.

Jon Skeet
Hmm. I will looking that. So basically use regex or whatever to find all urls and then foreach loop them and put them in this url and then use that.
chobo2
@chobo2: I wouldn't suggest parsing HTML with a regex, to be honest. But however you find the URLs, that's what I'd do with them.
Jon Skeet
A: 

It is always the first "?".

František Žiačik
if an URL is correct, everything after `?` is the query. Isn't it?
abatishchev
@abatishchev: yes.
František Žiačik
A: 

what about using encrypted query string and in your code you can decrypt it

OR you can use Request.PathInfo that make you not need ? in query string

Space Cracker
I don't really use query strings. This is what a user might put in and share with other users. So they could put bad things in the query string so I want to encode that.
chobo2
A: 

Here's a W3C reference addressing the composition of URIs with querystrings, which says in part:

The question mark ("?", ASCII 3F hex) is used to delimit the boundary between the URI of a queryable object, and a set of words used to express a query on that object.

DOK