ansaurus

Question

tracking metric over multiple time buckets - known algo?

Answer 1

+1 A:

It looks like this is common in financial services, using 'time comression' to speed data analysis when the original data set (or even indexes on the original data) don't fit into memory.

This link gives an example in SQL. I'd like to use the same memory/speed trade-off to track critical metrics in-process.

http://www.codeproject.com/KB/solution-center/Izenda-Speed-Dating.aspx

I'm wondering if I'm missing something simple, as it seems like this would be really useful but I don't see anyone else doing it.

crtracy 2010-09-16 17:11:13

Answer 2

A:

This is a good approach, and certainly one I've heard of and implemented, though I'm not familiar with any libraries which implement it in a generic way.

One alternative to doing this 'live' is to log things at a very fine-grained level in e.g. a database, and then progressively 'collapse' the data as it becomes out-of-date/irrelevant. For example, imagine a SQL table which contains {DATE, GRANULARITY, COUNT} tuples; you initially insert your counts with 'Second' granularity; periodically you come along and coalesce a set of rows like

DATE                GRANULARITY       COUNT
20100917 10:05:01   Second            4
20100917 10:05:08   Second            2
20100917 10:05:40   Second            1

into a single row:

20100917 10:05:00   Minute            7

based on their age, and then collapse the minutes into hours, etc etc.

We do something similar at my current employer; we log sampled data at a high frequency with the open-source Performance Co-Pilot tool, and then as the data becomes older and less valuable coalesce that into more compact, coarser-grained logs using the pmlogextract tool.

Cowan 2010-09-17 00:08:34

Yep, that matches what I describe. If no one has offered an open library to do this then maybe I'll be the first. Thanks for at least admitting you've taken this approach in the past, worth an answer nod.

crtracy 2010-09-18 01:34:40

ansaurus

tags:

views:

answers:

tracking metric over multiple time buckets - known algo?

related questions