I'm in the process of trying to switch from R to Python (mainly issues around general flexibility). With Numpy, matplotlib and ipython, I've am able to cover all my use cases save for merging 'datasets'. I would like to simulate SQL's join by clause (inner, outer, full) purely in python. R handles this with the 'merge' function.
I've tried the numpy.lib.recfunctions join_by, but it critical issues with duplicates along the 'key':
join_by(key, r1, r2, jointype='inner', r1postfix='1', r2postfix='2',
defaults=None, usemask=True, asrecarray=False)
Join arrays r1
and r2
on key key
.
The key should be either a string or a sequence of string corresponding
to the fields used to join the array.
An exception is raised if the key
field cannot be found in the two input
arrays.
Neither r1
nor r2
should have any duplicates along key
: the presence
of duplicates will make the output quite unreliable. Note that duplicates
are not looked for by the algorithm.
source: http://presbrey.mit.edu:1234/numpy.lib.recfunctions.html
Any pointers or help will be most appreciated!