data-processing

How to build multiple source feed parsing and data consolidating daemon?

I am given task to write a script (or better yet, a daemon), that has to do several things: Crawl most recent data from several input xml feeds. There are, like, 15-20 feeds for the time being, but I believe number might go up to 50 in future. Feed size varies between 500 KB and 5 MB (it most likely won't go over 10 MB). Since feeds ar...

How should I filter this data?

I have a several series of data points that need to be graphed. For each graph, some points may need to be thrown out due to error. An example is the following: The circled areas are errors in the data. What I need is an algorithm to filter this data so that it eliminates the error by replacing the bad points with flat lines, like so:...

How do you handle timezones for data processing?

curious how people have solved this problem... I have a series of jobs that run overnight that roll up reports based on that day's data for customers. They're now asking for timezone support. One of the reports is.. you had x number of orders last night, however last night could be different depending on timezone. What is the best way ...