Hello,
I'm designing an application that receives information from roughly 100k sensors that measure time-series data. Each sensor measures a single integer data point once every 15 minutes, saves a log of these values, and sends that log to my application once every 4 hours. My application should maintain about 5 years of historical data. The packet I receive once every 4 hours is of the following structure:
- Data and time of the sequence start
- Number of samples to arrive (assume this is fixed for the sake of simplicity, although in practice there may be partials)
- The sequence of samples, each of exactly 4 bytes
My application's main usage scenario is showing graphs of composite signals at certain dates. When I say "composite" signals I mean that for example I need to show the result of adding Sensor A's signal to Sensor B's signal and subtracting Sensor C's signal.
My dilemma is how to store this time-series data in my database. I see two options, assuming I use a relational database:
- Store every sample in a row of its own: when I receive a signal, break it to samples, and store each sample separately with its timestamp. Assume the timestamps can be normalized across signals.
- Store every 4-hour signal as a separate row with its starting time. In this case, whenever a signal arrives, I just add it as a BLOB to the database.
There are obvious pros and cons for each of the options, including storage size, performance, and complexity of the code "above" the database.
I wondered if there are best practices for such cases.
Many thanks.