views:

210

answers:

4

Take the following string as an example:

"The quick brown fox"

Right now the q in quick is at index 4 of the string (starting at 0) and the f in fox is at index 16. Now lets say the user enters some more text into this string.

"The very quick dark brown fox"

Now the q is at index 9 and the f is at index 26.

What is the most efficient method of keeping track of the index of the original q in quick and f in fox no matter how many characters are added by the user?

Language doesn't matter to me, this is more of a theory question than anything so use whatever language you want just try to keep it to generally popular and current languages.

The sample string I gave is short but I'm hoping for a way that can efficiently handle any size string. So updating an array with the offset would work with a short string but will bog down with to many characters.

Even though in the example I was looking for the index of unique characters in the string I also want to be able to track the index of the same character in different locations such as the o in brown and the o in fox. So searching is out of the question.

I was hoping for the answer to be both time and memory efficient but if I had to choose just one I care more about performance speed.

+2  A: 

Your question is a little ambiguous - are you looking to keep track of the first instances of every letter? If so, an array of length 26 might be the best option.

Whenever you insert text into a string at a position lower than the index you have, just compute the offset based on the length of the inserted string.

Kyle Cronin
+1  A: 

It would also help if you had a target language in mind as not all data structures and interactions are equally efficient and effective in all languages.

Unsliced
+1  A: 
A: 

The standard trick that usually helps in similar situations is to keep the characters of the string as leaves in a balanced binary tree. Additionally, internal nodes of the tree should keep sets of letters (if the alphabet is small and fixed, they could be bitmaps) that occur in the subtree rooted at a particular node.

Inserting or deleting a letter into this structure only needs O(log(N)) operations (update the bitmaps on the path to root) and finding the first occurence of a letter also takes O(log(N)) operations - you descend from the root, going for the leftmost child whose bitmap contains the interesting letter.

Edit: The internal nodes should also keep number of leaves in the represented subtree, for efficient computation of the letter's index.

Rafał Dowgird