I'm new to R and it looks like it is a good language to allow my colleagues to their job in a quick way, but I have to prepare the data for them and I want to do it "the R way".
the data I'm importing describes numeric measurements taken at various locations for more or less evenly spread timestamps. sometimes this "evenly spread" is not really true and I have to discard some of the values, it's not that important which one, as long as I have one value for each timestamp for each location.
what I do with the data? I add it to a result
data.frame. There I have a timestamp
column and the values in the timestamp column, they are definitely evenly spaced according to the step
.
timestamps <- ceiling(as.numeric((timestamps-epoch)*24*60/step))*step*60 + epoch
result[result$timestamp %in% timestamps, columnName] <- values
This does NOT work when I have timestamps that fall in the same time step. This is an example:
> data.frame(ts=timestamps, v=values)
ts v
1 2009-09-30 10:00:00 -2.081609
2 2009-09-30 10:04:18 -2.079778
3 2009-09-30 10:07:47 -2.113531
4 2009-09-30 10:09:01 -2.124716
5 2009-09-30 10:15:00 -2.102117
6 2009-09-30 10:27:56 -2.093542
7 2009-09-30 10:30:00 -2.092626
8 2009-09-30 10:45:00 -2.086339
9 2009-09-30 11:00:00 -2.080144
> data.frame(ts=ceiling(as.numeric((timestamps-epoch)*24*60/step))*step*60+epoch,
+ v=values)
ts v
1 2009-09-30 10:00:00 -2.081609
2 2009-09-30 10:15:00 -2.079778
3 2009-09-30 10:15:00 -2.113531
4 2009-09-30 10:15:00 -2.124716
5 2009-09-30 10:15:00 -2.102117
6 2009-09-30 10:30:00 -2.093542
7 2009-09-30 10:30:00 -2.092626
8 2009-09-30 10:45:00 -2.086339
9 2009-09-30 11:00:00 -2.080144
in Python I would (mis)use a dictionary to achieve what I need:
dict(zip(timestamps, values)).items()
returns a list of pairs where the first coordinate is unique.
in R I don't know how to do it in a compact and efficient way.