Hi, I would like to ask if there's any Java package or library that have the standard URL normalization?
5 Components of URL Representation
http://www[dot]example[dot]com:8040/folder/exist?name=sky#head
- scheme: http
- authority: www.example.com:8040
- path: /folder/exist
- query: ?name=sky
- fragment: #head
The 3 types of standard URL normalization
Syntax-Based Normalization
- Case normalization – convert all letter at scheme and authority components to lower case
- Percent-encoded normalization – decode any percent-encoded octet that corresponds to unreserved character, such as %2D for hyphen and %5 for underscore
- Path segment normalization – remove dot-segments from the path component, such as ‘.’ and ‘..’
Scheme-Based Normalization
- Add trailing ‘/’ after the authority component of URL
- Remove default port number, such as 80 for http scheme
- Truncate the fragment of URL
Protocol-Based Normalization
- Only appropriate when the results of accessing the resources are equivalent
- For example, example.com/data is directed to example.com/data/ by origin server