RFC 3986 ( http://www.ietf.org/rfc/rfc3986.txt ) says in Appendix B
The following line is the regular expression for breaking-down a
well-formed URI reference into its components.
**^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?**
12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e., each
paired parenthesis). We refer to the value matched for subexpression
as $. For example, matching the above expression to
http://www.ics.uci.edu/pub/ietf/uri/#Related
results in the following subexpression matches:
$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related
where indicates that the component is not present, as is
the case for the query component in the above example. Therefore, we
can determine the value of the five components as
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9