views:

137

answers:

4

I'm looking for the most efficient way to add an element to a comma-separated string while maintaining alphabetical order for the words:

For example:

string = 'Apples, Bananas, Grapes, Oranges'
subtraction = 'Bananas'
result = 'Apples, Grapes, Oranges'

Also, a way to do this but while maintaining IDs:

string = '1:Apples, 4:Bananas, 6:Grapes, 23:Oranges'
subtraction = '4:Bananas'
result = '1:Apples, 6:Grapes, 23:Oranges'

Sample code is greatly appreciated. Thank you so much.

+4  A: 

Split on ', ', remove the element, and join.

Ignacio Vazquez-Abrams
A: 
>>> import re
>>> re.sub("Bananas, |, Bananas$", "", "Apples, Bananas, Grapes, Oranges")
'Apples, Grapes, Oranges'

or

import re
strng = '1:Apples, 4:Bananas, 6:Grapes, 23:Oranges'
subtraction = '4:Bananas'
result = re.sub(subtraction + ", |, " + subtraction, "", strng)
print result

This works on your examples, but would need to be modified if the subtraction strings might contain regular expression metacharacters like [].*?{}\.

This is, as one commenter noted, a low-level string operation. It might just work, but an approach that takes the structure of your data into account should be more reliable. Whether splitting on a comma/space is enough, or whether you need the robustness of the csv module depends on the possible input strings you're expecting.

Tim Pietzcker
I don't think this handles cases where the first/last item is the one to be removed. In other words, it's not treating the input as a data list (as specified), but as a low-level string.
Lee B
It did handle the first case; I have now modified it to also handle the last; but I agree with you that an approach that takes higher-level structure into account might be preferable. In the spec, the input *is* a string, not a list, though.
Tim Pietzcker
+1  A: 

Matthew's comment above is the right approach but if you're sure that the , (comma followed by a space) occur only as separators, then something like this would work

def remove(str, element):
    items = str.split(", ")
    items.remove(element)
    return ", ".join(items)

I wouldn't recommend that you use strings as lists though. They're designed for a different purpose and following Matthew's advice is the right thing to do.

Noufal Ibrahim
+1  A: 

Ideally, something like:

input_str = '1:Apples, 4:Bananas, 6:Grapes, 23:Oranges'
removal_str = '4:Bananas'
sep = ", "

print sep.join(input_str.split(sep).remove(removal_str))

would work. But python doesn't return the new list from remove(), so you can't do that all on one line, and need temporary variables etc. A similar solution that does work is:

input_str = '1:Apples, 4:Bananas, 6:Grapes, 23:Oranges'
removal_str = '4:Bananas'
sep = ", "

print sep.join([ i for i in input_str.split(sep) if i != removal_str ])

However, to be as correct as possible, assuming you've no GUARANTEE that all items are valid, you'd need to verify that each item matches ALL of the specifications given to you, namely that they're of the format number:identifier. The simplest way to do that is to use the re module to search for a specific regular expression format, return all results, and skip results that don't match what you want. Using deliberately compact code, you get a reasonably short solution that does good validation:

def str_to_dictlist(inp_str):
    import re
    regexp = r"(?P<id>[0-9]+):(?P<name>[a-zA-Z0-9_]+)"
    return [ x.groups() for x in re.finditer(regexp, inp_str) ]

input_str = '1:Apples, 4:Bananas, 6:Grapes, 23:Oranges'
subtraction_str = "4:Bananas"
sep = ", "

input_items = str_to_dictlist(input_str)
removal_items = str_to_dictlist(subtraction_str)
final_items = [ "%s:%s" % (x,y) for x,y in input_items if (x,y) not in removal_items ]

print sep.join(final_items)

This also has the advantage of handling multiple removals at the same time. Since the input format and removal formats are so similar, and the input format has multiple items, it makes sense that the removal format might need to support them too -- or at least, that it's useful to have that support.

Note that doing it this way (using re to search) would make it difficult to detect items that DON'T validate though; it would just scan for anything that does. As a hack, you could count commas in the input and report a warning that something might have failed to parse:

if items_found < (num_commas + 1):
    print warning_str

This would warn about commas without spaces as well.

To parse more complex input strings properly, you need to break it down into individual tokens, track input lines and columns as you parse, print errors for anything unexpected, and maybe even handle stuff like backtracking and graph-building for more complex inputs like source code. For that sort of stuff, look into the pyparsing module (which is a third-party download; it doesn't come with python).

Lee B