views:

85

answers:

3

I'm looking for the most efficient way to add an element to a comma-separated string while maintaining alphabetical order for the words:

For example:

string = 'Apples, Bananas, Grapes, Oranges'
addition = 'Cherries'
result = 'Apples, Bananas, Cherries, Grapes, Oranges'

Also, a way to do this but while maintaining IDs:

string = '1:Apples, 4:Bananas, 6:Grapes, 23:Oranges'
addition = '62:Cherries'
result = '1:Apples, 4:Bananas, 62:Cherries, 6:Grapes, 23:Oranges'

Sample code is greatly appreciated. Thank you so much.

+3  A: 

Are you sure you should be storing the data as a string?

It probably makes more sense to maintain a set or list (or, in your second case, a dictionary) and generate the string when you need to. If the data don't change very often, cache the string.

With any solution that uses the string as your primary data storage, you'll probably end up generating a temporary list to make it easier to insert the element -- so it makes more sense just to keep the list.

Etaoin
+7  A: 

For the first case:

alist = string.split(', ')
result = ', '.join(sorted(alist + [addition]))

For the second case:

alist = string.split(', ')
result = ', '.join(sorted(alist + [addition],
                          key=lambda s: s.split(':', 1)[1]))

If you have many thousands of items in the list, the first case might show measurable performance improvement if you're willing to go to the much-greater complication of bisect.insort; but that doesn't support a key=, so the extra complication in the second case would be staggering and probably not even buy you any performance.

The kind of optimizations mentioned in the last paragraphs are worth considering only if a profile of your whole application shows that this operation is a crucial bottleneck for it (and if it is, you'd gain much more speed by keeping this data structure as a list of words, ', '-joining it only at need presumably for output purposes, rather than splitting up and rejoining thousands and thousands of times for the kind of extremely long lists where such optimizations might possibly be warranted).

Alex Martelli
Thank you, Alex. This was exactly what I needed. Great for learning too.
ensnare
How could I modify this if I have 3 elements separated by commas, such as: "1:Political Science:Poli_sci,2:Engineering Management:Engineer_Man"?
ensnare
Split and join by `','` rather than by `', '` if you have/want just a comma, not a comma and space.
Alex Martelli
+2  A: 

Here's one way to do what you want:

>>> ", ".join(sorted('Apples, Bananas, Grapes, Oranges'.split(", ") +
...                  ["Cherries"]))
'Apples, Bananas, Cherries, Grapes, Oranges'

and "while maintaining IDs":

>>> ", ".join(sorted('1:Apples, 4:Bananas, 6:Grapes, 23:Oranges'.split(", ") + 
...                  ["62:Cherries"], key=lambda x: x.split(":")[1]))
'1:Apples, 4:Bananas, 62:Cherries, 6:Grapes, 23:Oranges'

I'm intentionally ignoring the part of the question where you asked for the "most efficient" way to do something. Proving that an algorithm is the most efficient possible approach to a particular problem is an unsolved problem of computer science. It may not be possible to do at all, and there are certainly no current techniques for it.

If you are concerned about efficiency, however, you should store intermediary data structures, and not do these kinds of operations on strings; any string-based operation is going to waste a bunch of time copying memory around; you should only convert to and from strings once all of your processing is done.

Glyph
+1 for mentioning that strings is not an efficient/readable way of storing such structures.
EOL