tags:

views:

61

answers:

3

Hello I am curious what would be an effient way of uniquefying such data objects:

testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]

For each data pair, left numeric string PLUS the type at the right tells the uniqueness of a data element. And it returns a list of lists as same as the testdata, but only uniques are existing.

Regards

+2  A: 

You can use a set:

unique_data = [list(x) for x in set(tuple(x) for x in testdata)]

You can also see this page which benchmarks a variety of methods that either preserve or don't preserve order.

Mark Byers
Do note that you lose the ordering with this method. If it's relevant than you'll have to sort it after or remove the items manually.
WoLpH
@Mark: I am getting an error: `TypeError: unhashable type: 'list'`. Python 2.6.2, Ubuntu Jaunty.
Manoj Govindan
@Hellnar: he just updated the code to use a tuple, now you won't get that problem anymore :)
WoLpH
@Manoj Govindan: The problem occurs because lists aren't hashable and only hashable types can be used in a set. I have fixed it by converting to tuples and then converting back to a list afterwards. Probably though the OP should be using a list of tuples.
Mark Byers
+1  A: 

I tried @Mark's answer and got an error. Converting the list and each elements into a tuple made it work. Not sure if this the best way though.

list(map(list, set(map(lambda i: tuple(i), testdata))))

Of course the same thing can be expressed using a list comprehension instead.

[list(i) for i in set(tuple(i) for i in testdata)]

I am using Python 2.6.2.

Update

@Mark has since changed his answer. His current answer uses tuples and will work. So will mine :)

Update 2

Thanks to @Mark. I have changed my answer to return a list of lists rather than a list of tuples.

Manoj Govindan
@Mark: done. Thanks!
Manoj Govindan
@Manoj Govindan: Here's a little trick: instead of `lambda x: foo(x)` you can just write `foo`.
Mark Byers
@Mark: Where `foo` is a callable. Gotcha.
Manoj Govindan
+1  A: 
import sets
testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']]
conacatData = [x[0] + x[1] for x in testdata]
print conacatData
uniqueSet = sets.Set(conacatData)
uniqueList = [ [t[0:-3], t[-3:]] for t in uniqueSet]
print uniqueList
pyfunc
The other replies are way cooler!
pyfunc
Also, the sets module is deprecated, use the builtin set-type instead.
Space_C0wb0y