views:

28

answers:

1

Hi,

I have an high performance application which deals with URLs. For every URL it needs to retrieve the appropriate settings from a predefined pool. Every settings object is associated with a URL pattern which indicates which URLs should use these settings. The matching rules are as follows:

  1. "google.com" match pattern should match all URLs pointing to the google domain (thus, maps.google.com and www.google.com/match are matched).
  2. "*.google.com" should match all URLs pointing to a subdomain of google.com (thus, maps.google.com matches, but google.com and www.google.com don't).
  3. "maps.google.com" should match all URLs pointing to this specific subdomain.

Apart from the above rules, every match rule can contain a path, which means that the path part of the URL should start with the match rule path. So: "*.google.com/maps" matches "maps.google.com/maps" but not "maps.google.com/advanced".

As you can see the rules above are overlapping. In the case two rules exist which match the same URL the most specific should apply. The list above is ranked from least specific to most specific.

This seems to be such a standard problem that I was hoping to use a ready made library rather than program my self. Google reveals a couple of options but without a clear way to choose between them. What would you recommend as a good library for this task?

Thanks, Boaz

+1  A: 

I don't think you need a specific library to solve this; the standard Java API has all that you need to write the code without too much work.

Take a look at java.util.regex.Pattern and work out the regular expressions you need to match each of your rules. You might also want to use java.net.URL to parse out the different fields from the URL.

You already said you have a priority scheme to handle scenarios where multiple patterns match the URL, so that should be the last piece for this puzzle.

It looks like a pretty straight-forward task.

Joel Hoff