ansaurus

Question

Methodology for saving values over time

Answer 1

+1 A:

Just for the memory part...

1.You can save the data as xElemet (sorry for not knowing much about linq) but it holds an XML logic.

2.hold a record counter.

after n records save the xelement to an xmlFile (data1.xml,...dataN.xml)

It can be a perfect log to any parameter you have with any logic you like:

<run>
  <step id="1">
     <param1 />
     <param2 />
     <param3 />
  </step>
  .
  .
  .
  <step id="N">
     <param1 />
     <param2 />
     <param3 />
  </step>
</run>

This way your memory is free and the data is relatively free. You don't have to think too much about DB issues and it's pretty amazing what LINQ can do for you... just open the currect XML log file...

2009-12-21 10:33:33

I thought of something similar to this, but I can't see how this works with the playback requirements plus the size of the XML elements could cause the size of these files to double what they could be. Since, I expect these files to be a few gigabytes each this seem too wasteful. Taking into account the full set of requirements is the hard part and is why I am asking this question. I am open to any thoughts and discussions to help work this out. Thanks.

Jim Kramer 2009-12-21 13:12:56

Millions of steps... It will take you ages to read this XML.

romkyns 2009-12-21 16:38:18

Answer 2

+3 A:

I would start with SQLite. SQLite is like a binary format library that you can query conveniently and quickly. It is not really like a database, in that you can really run it on any machine, with no installation whatsoever.

I strongly recommend against XML, given the requirement of millions of steps, potentially with millions parameters.

EDIT: Given the sheer amount of data involved, SQLite may well end up being too slow for you. Don't get me wrong, SQLite is very fast, but it won't beat seeks & reads, and it looks like your use case is such that basic binary IO is rather appropriate.

If you go with the binary IO method you should expect some moderately involved coding, and the absence of such niceties as your file staying in a consistent state if the application dies halfway through (unless you code this specifically that is).

romkyns 2009-12-21 16:38:04

I'd go with SQL CE instead - same usage, but more functional and accessible from C#.

codekaizen 2009-12-21 16:42:15

Fair point. SQLCE may be better in terms of toolchain. It's a little harder to redistribute though - I believe the standard way is by redistributing the (free) MSI.

romkyns 2009-12-21 16:57:50

Interesting though. I just did a back of the envelope calculation and each entity will have value logs totaling about 1.5 GB and with just 4 entity that would exceed the 4GB limits of SQLCE. I just did a quick search for limits of SQLLite and I could not find it. I will do more searching. This was an idea I had not thought about and is the kind of information I need.

Jim Kramer 2009-12-21 17:40:55

http://www.sqlite.org/whentouse.html suggests (under "Very large datasets") that in the default build, the max file size is 2TB. Also, using numbers from http://www.sqlite.org/limits.html, the absolute upper limit appears to be 32TB, but you'd need to compile it yourself to get this.

romkyns 2009-12-21 22:58:50

coding complexity is not an issue. It is just with the range of requirements and desire not to pix a wrong solution is the reason I have asked this question. So far this discussion still leads me to a similar solution to my original thoughts.

Jim Kramer 2009-12-22 02:37:17

Answer 3

+2 A:

KISS -- just write a logfile for each entity and at each time slice write out every parameter in a specified order (so you don't double the size of the logfile by adding parameter names). You can have a header in each logfile if you want to specify the parameter names of each column and the identify of the entity.

If there are many parameter values that will remain fixed or slowly changing during the course of the simulation, you can write these out to another file that encodes only changes to parameter values rather than every value at each time slice.

You should probably synchronize the logging so that each log entry is written out with the same time value. Rather than coordinate through a central file, just make the first value in each line of the file the time value.

Forget about database - too slow and too much overhead for simulation replay purposes. For replaying of a simulation, you will simply need sequential access to each time slice, which is most efficiently and fastest implemented by simply reading in the lines of the files one by one.

For the same reason - speed and space efficiency - forget XML.

Larry Watanabe 2009-12-21 16:44:34

This was my general thoughts, but I am just not sure because of the wide range requirements. Yes the logging will be synchronized no matter the method I use. Secondary, almost every value will change during each step.

Jim Kramer 2009-12-21 17:44:15

If you go this way, I think you're better off with a binary file and fixed width entries, rather than lines of text. That will make seeking a trivial task (and will be much more efficient space-wise). It seems from your description that fixed-width entries are no problem.

romkyns 2009-12-21 23:32:22

I expect to write out raw binary copies of the floating/integer values.

Jim Kramer 2009-12-22 02:38:39

binary will be more efficient, but if you go this route write a tool that will allow you to convert these files to text form. Then you can just pipe it to e.g. "more" or whatever. This will aid in debugging.

Larry Watanabe 2009-12-22 02:49:35

The wide-range requirements are probably there because the actual requirements are unknown and people are erring on the safe side. When you actually see the data, that will be the time to start optimizing - not before. "Premature optimization is the root of all evil".

Larry Watanabe 2009-12-22 02:50:59

No, the requires are known and are not just erring on the safe side. While it is true that the first uses of these requirements will come no ware near the limits specified, but the finial uses will and in fact my be understated.

Jim Kramer 2009-12-22 03:20:44

ansaurus

tags:

views:

answers:

Methodology for saving values over time

related questions