views:

138

answers:

3

Hi, I would like to ask if there's any Java package or library that have the standard URL normalization?

5 Components of URL Representation

http://www[dot]example[dot]com:8040/folder/exist?name=sky#head

  1. scheme: http
  2. authority: www.example.com:8040
  3. path: /folder/exist
  4. query: ?name=sky
  5. fragment: #head

The 3 types of standard URL normalization

Syntax-Based Normalization

  • Case normalization – convert all letter at scheme and authority components to lower case
  • Percent-encoded normalization – decode any percent-encoded octet that corresponds to unreserved character, such as %2D for hyphen and %5 for underscore
  • Path segment normalization – remove dot-segments from the path component, such as ‘.’ and ‘..’

Scheme-Based Normalization

  • Add trailing ‘/’ after the authority component of URL
  • Remove default port number, such as 80 for http scheme
  • Truncate the fragment of URL

Protocol-Based Normalization

  • Only appropriate when the results of accessing the resources are equivalent
  • For example, example.com/data is directed to example.com/data/ by origin server
A: 

http://java.sun.com/javase/6/docs/api/java/net/URL.html handles most of what you're looking for.

Tansir1
I'll take a look and see if it suits my needs :D
lockone
+2  A: 
URI uri = URI.create("http://www.example.com:8040/folder/exist?name=sky#head");
String scheme = uri.getScheme();
String authority = uri.getAuthority();
// ...

http://java.sun.com/j2se/1.4.2/docs/api/java/net/URI.html

Alain O'Dea
Thanks, this helps a lot.
lockone
@lockone: My pleasure :)
Alain O'Dea