ansaurus

Question

Answer 1

+2 A:

You might go with a data structure like a nested dictionary and iterate over the contents:

Dictionary <TimeSeries, Dictionary<DateTime, Value>> dict = new Dictionary<TimeSeries, Dictionary<DateTime, Value>>();

foreach (TimeSeries series in dict.Keys) {

    //table row output code goes here
    Dictionary<DateTime, Value> innerDict = dict[series];
    foreach (DateTime date in innerDict.Keys) {
        Value seriesValueAtTimeT = innerDict[date];
        //table column output code goes here
    }
}

Where your output code is writing out to something else, depending on your needs, and you replace the datatypes TimeSeries, Value, etc., with your actual data types.

Doug R 2009-06-19 20:40:49

Thanks, Doug R. This pointed me in a direction that works for me.I'm using the SortedDictionary and Dictionary collections as follow:SortedDictionary<DateTime, Dictionary<string, double>> where datetime is the timestamp and the nested dictionary contains all the values from the different time-series with that timestamp. This way I can easily iterate over the chronological list of timestamps, get all the values, and shove the data line by line into a DataTable or text file.

chriscena 2009-06-22 09:45:17

Answer 2

A:

One thing you can do if memory usage becomes a problem is to move from tracking single events to tabulated counts of an event within a given time period. You lose some precision about exactly when things happened, but often summarizing your data like this can reduce the complexity of your picture and make trends more obvious.

In case it's not obvious, I mean taking data that looks like this:

12:00 event1
12:01 event2
12:10 event1
12:11 event1

to this:

12:00-12:15 event1 3
12:00-12:15 event2 1

James Thompson 2009-06-19 21:04:59

Answer 3

+1 A:

You can first scan all present series for the distinct values (for example, aggregating them in a HashSet), then simply dump them into an array of dates (storing a match between date and index position in a dictionary).

var distinctDates = allSeries
  .SelectMany(s => s.Values.Select(v => v.Date))
  .Distinct()
  .OrderBy(d => d)
  .ToArray();

var datePositions = distinctDates
  .Select((d,index) => new 
    {
      Date = d,
      Index = index
    }).
  .ToDictionary(x => x.Date, x => x.Index);

Then, create a jagged array that has width of "NumberOfSeries" and length of "NumberOfDates". After that, do a second scan of all the data and dump them to their positions.

var values = new float[allSeries.Length][];
for (var i=0;i<allSeries.Length;i++)
{
  values[i] = new float[distinctDates.Length];
  var currentSerie = allSeries[i];
  foreach(var value in currentSerie.Values)
  {
    var index = datePositions[value.Date];
    values[i][index] = value.Value;
  }      
}

I wrote this code without touching VisualStudio, so I may have a few typos. Or there may be used a few LINQ methods that are not present in the .NET (just look in Lokad.Shared.dll). But you should be able to get the idea.

Some more notes, while I'm at the topic:

Go for the jagged array, if you have to keep everything in the memory at once. It is way more efficient than a dictionary and has a lot less memory problems than a rectangular array.
Keep Value objects as small as possible (i.e.: float instead of double).
If number of time serie values is expected to go large in the future, then never store values in database in a "one row per value". It is recommended to either go for something like HDF (which has a .NET interface) or use persist time serie fragments in binary form in DB (as in time serie databases)

Sticking to these should allow you to scale up to hundreds of millions of time values without a lot of problems (done that).

Rinat Abdullin 2009-06-19 22:52:53

Thanks for your suggestion, but in this particular project I'm stuck with .NET 2.0 for now.

chriscena 2009-06-22 06:22:29

You can use all that in .NET 2.0 http://abdullin.com/how-to-use-net-35-syntax-and-compiler-features-for-net-20/

Rinat Abdullin 2009-06-22 10:44:15

Answer 4

+1 A:

Hey Chris. I realize you've already accepted an answer, but thought I'd post a solution I use. Hopefully it's of some use to someone. If not at least it provides a place for me to find it sometime in the future. :-)

This is VBA code fired off directly from an Excel 2007 code module. It could be easily converted to .Net.

The key for the data manipulation is the pivot table object. I've found it to be massively efficient at getting the data into the layout you've specified.

Sub GetIndexData ()
Dim cn as ADODB.Connection, cmd As ADODB.Command, rs As ADODB.Recordset
Dim rPivotTopLeft As Range, rPivotBottomRight As Range

Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual

'Get the data.'
Set cn = New ADODB.Connection
With cn
  .Provider = "SQLOLEDB"
  .ConnectionString = "Database=" & mDBName & ";" & _
                      "Server=" & mDBServerName & ";" & _
                      "UID=" & mDBUserID & ";" & _
                      "Password=" & mDBPassword & ";" & _
                      "Persist Security Info=True;"
  .CursorLocation = adUseClient
  .Open
End With

Set cmd = New ADODB.Command
Set rs = New ADODB.Recordset
With cmd
  .ActiveConnection = adoTools.DBConnection
  .CommandType = adCmdText
  .CommandText = "SELECT YourData From YourSource WHERE YourCritera"
  Set rs = .Execute
End With



If Not (rs.BOF And rs.EOF) Then 'Check that we have some data.'

'Put the data into a worksheet.'
With wsRawData
  .Cells.CurrentRegion.Clear

  Set rPivotTopLeft = .Range("A1")
  With ThisWorkbook.PivotCaches.Add(SourceType:=xlExternal)
    Set .Recordset = rs
    .CreatePivotTable _
        TableDestination:=rPivotTopLeft, _
        TableName:="MyPivotTable"
  End With

  'Massage the data into the desired layout.'
  With .PivotTables("MyPivotTable")
    .ManualUpdate = True

    .PivotFields("Date").Orientation = xlRowField
    .PivotFields("Index").Orientation = xlColumnField
    .AddDataField .PivotFields("Return"), "Returns", xlSum

    .DisplayFieldCaptions = False
    .ColumnGrand = False
    .RowGrand = False

    .ManualUpdate = False
  End With

  mMonthCount = Range(.Range("A3"), .Cells(Rows.Count, "A").End(xlUp)).Count
  mIndexCount = Range(.Range("B2"), .Cells(2, Columns.Count).End(xlToLeft)).Count

  'Convert pivot table to values.'
  Set rPivotBottomRight = .Cells(mMonthCount + 2, mIndexCount + 1)
  With .Range(rPivotTopLeft, rPivotBottomRight)
    .Copy
    .PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks:=False, Transpose:=False
  End With

  'Format the worksheet.'
  .Range("A3").Resize(mMonthCount, 1).NumberFormat = "mmm-yy"
  .Range("B3").Resize(mMonthCount, mIndexCount).NumberFormat = "0.00%"
  Union(.Rows(2), .Columns(1)).Font.Bold = True
  .Cells.ColumnWidth = 7.14
  .Rows(1).Delete

End With


rs.close
Set rs = Nothing
cmd.ActiveConnection = Nothing
Set cmd = Nothing
cn.close
Set cn = Nothing

End Sub

From there its relatively easy to leverage the built in excel regression statistics to output a correlation matrix. With this technique I'm producing a worksheet with a 600x600 matrix of correlations in about 45 seconds.

Note that the .PivotFields parameters should be changed to fit the column names of your data from your data source.

GollyJer 2009-06-24 03:40:47

ansaurus

tags:

views:

answers:

Time-series and correlation strategies

related questions