views:

46

answers:

4

Hi folks,

I will preface the question by saying that I am somewhat new to the .NET world, and might be missing something entirely obvious. If so, I'd love to hear what that is!

I often find myself writing small program that all do more or less the same thing:

  1. Read in data from one or more files
  2. Store this data in memory, in some sort of container
  3. Crunch data, output analysis results to a text file and quit

I often find myself creating monstrous-looking containers to store said data. E.g.:

Dictionary<DateTime, SortedDictionary<ItemType, List<int>>> allItemTypesAndPropertiesByDate =
            new Dictionary<DateTime, SortedDictionary<ItemType, List<int>>>();

This works, in the sense that the data structure describes my intent more or less accurately - I want to be able to access item types and properties by date. Nevertheless, I feel that the storage container is too tightly bound to the output data format (if tomorrow I decide that I'd like to find all dates on which items with certain properties were seen, this data structure becomes a liability). Generally, making input and output changes down the line is time-consuming and error-prone. Plus, I have to keep staring at these ugly-looking declarations - and code to iterate over them is not pretty either.

On the other end of the complexity spectrum, I can create a SQL database with schema that describes input in a more flexible format, and then run queries (using SQL or LINQ to SQL) against the database. This certainly works, but feels like too big of a hammer - I write many programs like these, and don't want to create a database for each one, manage the SQL dependency (even if it is SQL express on local machine), etc. I don't need to actually persist the data - just to read it in, keep it in memory, make a few queries and quit. Even using an in-memory SQLite instance feels like an overkill. I am not overly concerned with runtime performance - these are usually just little local machine experiments - but it just feels wrong.

Ideally, what I would like is to have a low-overhead, in memory row store with a loosely-defined schema that is easily LINQ-queryable, and takes only a few lines of code to set up and use. Does the Microsoft .NET 4 stack include something like this? If you found yourself in a similar predicament, what would you do?

Your thoughts are appreciated - thanks!

Alex

+1  A: 

If you find a database structure easier to work with, one option might be to create a DataSet with DataTables representing your schema which you can then query using Linq 2 DataSets

Chris Taylor
This looks like a very promising option! I've been dimly aware of DataSets, but did not consider their use as stand-alone containers, rather than client caches for remote data. The overhead (in terms of amount of code required) for using DataSets seems a bit higher than I'd like, but I will definitely give it a shot. Thanks!
Inverseofverse
A: 

Or you could try to use object databases like db4o; they store the actual objects you would work with, helping you to program in a more object-oriented manner, and it's quite easy to work with. Also, it's not a database server in the traditional sense of the word - it uses flat files as containers and reads/writes directly from/to them.

Alex Paven
It's an interesting-looking technology, but still an overkill for what I want to do - I don't actually need to commit anything to disk, in whiechever form. I'd also prefer to stay within the framework and not have an external dependency, if possible.
Inverseofverse
A: 

Why not just use linq?

You can read the data into flat lists, then chain some linq statements to get the structure you want.

Apologies if I'm missing something, but I don't think you need an intermediate.

Binary Worrier
The issue is that my input data is not appropriate for storage in flat lists. There are variable number of records, alternate one-to-one and one-to-many relationships, etc. In the simplest form, it's simply not possible to store all this variety in a single table. I could come up with an object (as devio also suggested) and store a list of these objects, but that would get me right back into the world of ugly declarations and tight output couplings, just wrapped within an object this time. I'd rather pay the performance cost and have a relational store to deal with.
Inverseofverse
A: 

Comparing databases and OOP, a table definition corresponds to a class definition, a record is an object, and the table data is any kind of collection of objects.

My approach would be to define classes and properties representing the file contents, parse each file entry into an object, and add these objects into a List< T>.

This List can then be queried using Linq.

devio
A list of objects seems suboptimal for my purposes - please take a look at the reply I left for Binary Worrier's similar suggestion.
Inverseofverse