tags:

views:

381

answers:

2

I've got a whitelist of URLs I'm using, inside a HashSet<string>. I'm trying to find if the url starts with any of the items in the white list (it has to be that way round).

Edit: The previous example was a bit misleading and had a typo - I already have a base url like yahoo.com, the whitelist is just the path.

HashSet<string> whiteList = new HashSet<string>();

string path = "/sport/baseball/";
bool validUrl = false;

foreach (string item in whiteList)
{
    if (path.StartsWith(item))
    {
        validUrl = true;
        break;
    }
}

Is there a more elegant way of doing this lookup with LINQ (to objects)? The list isn't huge so performance isn't an issue.

+9  A: 
bool validUrl = whiteList.Any(item => linkUrl.StartsWith(item));

By the way, in general, hash tables are not good data structures for these kind of problems (where you don't have the key and are matching the key based on a function) as you'll have to enumerate the whole table all the time. You can use a simple List<string> to hold the items instead and you'll get better performance.

Mehrdad Afshari
8 lines replaced by one. Linq is wonderful isn't it?
Spence
@Spence: Functional programming is a cave of wonders. Wonders are going to be rediscovered as time goes on.
Mehrdad Afshari
Rediscovered is right; given that FP is as old as the caves.
Noon Silk
It's not quite there yet in C# because the compiler in the general case still writes imperative code. It will be very interesting if the C# compiler can start doing fancy stuff based on the fact that your asking the computer to perform a given function, not HOW to perform it.
Spence
Spence: That makes no sense at all.
Noon Silk
I'm guessing `Any` is faster than the overhead of a hashtable, though the whitelist is actually just read once from a file path in the constructor
Chris S
@Chris S: `Any` is basically a `foreach` loop and should be similar to what you've written performance-wise. Nevertheless, the same code is applicable for `List<string>` as `Any` works on *any* object that implements `IEnumerable<T>`. I don't think it makes a meaningful difference for your case as the white list is probably small.
Mehrdad Afshari
+1  A: 

The issue here is with the lookup. Do you have any regularity in the whitelist? i.e will it always be a domain you're after, not neccessarily the pages within or a specific subdomain?

If so you could use a string.split to grab the first URL part from your string, then use the .Contains() method of your hashset to get the item. This would remove the string.StartsWith() command which is run once for every element in the list, and an expensive string compare, and replace it with a one off string.split and a O(1) lookup of your hashset.

HashSet<string> whiteList = new HashSet<string>();
//add items

string urlStartsWith = "http://www.yahoo.com";
bool validURL = whiteList.Contains(url);
Spence