views:

256

answers:

4

Hi, I have a set of data similar to this:

No Start Time End Time CallType Info 
1 13:14:37.236 13:14:53.700 Ping1  RTT(Avr):160ms
2 13:14:58.955 13:15:29.984 Ping2  RTT(Avr):40ms
3 13:19:12.754 13:19:14.757 Ping3_1  RTT(Avr):620ms
3 13:19:12.754          Ping3_2  RTT(Avr):210ms
4 13:14:58.955 13:15:29.984 Ping4  RTT(Avr):360ms
5 13:19:12.754 13:19:14.757 Ping1  RTT(Avr):40ms
6 13:19:59.862 13:20:01.522 Ping2  RTT(Avr):163ms
...

when i parse through it, i need merge the results of Ping3_1 and Ping3_2. Then take average of those two row export as one row. So the end of result would be like this:

No Start Time End Time CallType Info 
1 13:14:37.236 13:14:53.700 Ping1  RTT(Avr):160ms
2 13:14:58.955 13:15:29.984 Ping2  RTT(Avr):40ms
3 13:19:12.754 13:19:14.757 Ping3  RTT(Avr):415ms
4 13:14:58.955 13:15:29.984 Ping4  RTT(Avr):360ms
5 13:19:12.754 13:19:14.757 Ping1  RTT(Avr):40ms
6 13:19:59.862 13:20:01.522 Ping2  RTT(Avr):163ms

currently i am concatenating column 0 and 1 to make a unique key, find duplication there then doing rest of special treatment for those parallel Pings. It is not elegant at all. Just wonder what is the better way to do it. Thanks!

A: 

Assuming the duplicates are adjacent you can use a generator like this. I guess you already have some code to average the pings

def average_pings(ping1, ping2):
    pass

def merge_pings(seq):
    prev_key=prev_key=None
    for item in seq:
        key = item.split()[:2]
        if key == prev_key:
            yield average_pings(prev_item, item)
        else:
            yield item
        prev_key=key
        prev_item=item
gnibbler
A: 

I'm not sure on how your data is structured, so I'll assume a list of dicts for duck typing purposes.

I'm also assuming the real primary key of your dataset is Start.

for i in range(len(dataset)-1):
  #Detect duplicates, assuming they are sorted properly
  if dataset[i]["Start"] == dataset[i+1]["Start"]:
    #Merge 'em
    dataset[i+1] = merge(dataset[i], dataset[i+1])

    #Deleting items from the array you are iterating over is a bad idea
    dataset[i] = None

dataset = [item for item in dataset if item != None] #so just delete them later

...where merge would be the function that actually does the merging.

Not elegant, C-ish, but probably better than what you are currently using.

They're not sorted?

dataset.sort( (lambda x,y: return cmp(x["Start"],y["Start"])) )

Now they should be.

badp
never use cmp to sort. it is slower. use key: `dataset.sort(key=operator.itemgetter('Start'))`
nosklo
Well I guess I won't have a choice with Py3k at any rate.
badp
A: 

Assuming your duplicates are adjacent (as they're shown on your question), itertools.groupby is the ideal way to identify them as duplicates (with a little help from operator.itemgetter to extract the "key" defining identity. Assuming you have a list of objects (the pings) with attributes such as .start and .end:

import itertools
import operator

def merge(listofpings):
  k = operator.itemgetter('start', 'end')
  for i, grp in itertools.groupby(listofpings, key=k):
    lst = list(grp)
    if len(lst) > 2:
      item = mergepings(lst)
    else:
      item = lst[0]
    emitping(i, item)

assuming you already have functions mergepings to merge a list of > 1 "duplicate" pings, and emitping to emit a numbered ping (bare or merged).

If listofpings is not already properly sorted, just add listofpings.sort(key=k) just before the for loop (presumably emitting in sorted order is OK, right?).

Alex Martelli
A: 

Thanks a lot for the suggestions, need sometime to digest and then update the outcome. Thanks Again!

Wei Lou