views:

109

answers:

1

This is an interview questions. I tried hard to come up with my desired solution, but no success yet!

An approach of O(n^2) time is pretty obvious with O(1) space, where n is the length of string/ total number of URLs.

An O(n) solution can be achieved which requires additional space. In the case of non-repeating first character in a string, start at the end of the string, and scan towards the front. Use a bit array to keep track of which character values have occurred so far. If a character isn't seen yet (i.e. to the right of current index), set this character as a "probable non-repeating" character. This will be updated as the scanning processed to the LEFT. When reach at first index, the last "not seen before" character is the result. For ASCII char set, this is quite acceptable solution; as it needs only 256 bit array. However, for UNICODE char set the space complexity is higher. In the case of non-repeating URL in a file, a similar approach might be applicable using a Hash Table. Space is big concern here to implement the hashing, like, storing the URLs for probable collisions.

I'm looking for some better solution with O(n) time complexity, and constant or logarithmic space complexity. Please share your idea in programming language like C, C++, or Java. Thanks.

+1  A: 

The general algorithm more or less is obvious (only one pass of the sequence) - pseudocode, sorry :)

set s
for each x in sequence
    if s.contains(x)
       return x
    else
       s.add(x)
end

The only remaining part is what set data structure to choose. if |U| is the size of the domain of the set (e.g. the alphabet), then based on the expected maximum value of |s| / |U| we decide whether to use a bit-vector or a hashtable. (Note that even for a huge alphabet a bit-vector would be better than a hashtable if we expect most of the letters to appear).

Also note that in order to use a bit-vector, it is implied that you must be able to rank the elements, i.e. map them to a number in [0..n). This is straightforward when we speak of characters, but not for the rest of your input types.

Dimitris Andreou