For resolving paths, .
, and ..
, (and in most cases, //
for Unix and \\
for Windows) are the main things you really need to worry about in terms of resolving paths. From RFC 3986, this is the algorithm for resolving relative paths in URIs. For the most part, it also applies to file system paths.
An algorithm, remove_dot_segments
:
- The input buffer is initialized with the now-appended path
components and the output buffer is initialized to the empty
string.
- While the input buffer is not empty, loop as follows:
- If the input buffer begins with a prefix of
"../"
or "./"
,
then remove that prefix from the input buffer; otherwise,
- If the input buffer begins with a prefix of
"/./"
or "/."
,
where "."
is a complete path segment, then replace that
prefix with "/"
in the input buffer; otherwise,
- If the input buffer begins with a prefix of
"/../"
or "/.."
,
where ".."
is a complete path segment, then replace that
prefix with "/"
in the input buffer and remove the last
segment and its preceding "/"
(if any) from the output
buffer; otherwise,
- If the input buffer consists only of
"."
or ".."
, then remove
that from the input buffer; otherwise,
- Move the first path segment in the input buffer to the end of
the output buffer, including the initial
"/"
character (if
any) and any subsequent characters up to, but not including,
the next "/"
character or the end of the input buffer.
- Finally, the output buffer is returned as the result of
remove_dot_segments
.
Example run:
STEP OUTPUT BUFFER INPUT BUFFER
1 : /a/b/c/./../../g
2E: /a /b/c/./../../g
2E: /a/b /c/./../../g
2E: /a/b/c /./../../g
2B: /a/b/c /../../g
2C: /a/b /../g
2C: /a /g
2E: /a/g
STEP OUTPUT BUFFER INPUT BUFFER
1 : mid/content=5/../6
2E: mid /content=5/../6
2E: mid/content=5 /../6
2C: mid /6
2E: mid/6
Don't forget that it's possible to do things like specify more ".."
segments than there are parent directories. So if you're trying to resolve a path, you could end up trying to resolve beyond /
, or in the case of Windows, C:\
.