views:

73

answers:

2

Hello,

there is a very time-consuming operation which generates a dataset in my package. I would like to save this dataset and let the package rebuild it only when I manually delete the cached file. Here is my approach as part of the package:

myDataset = Module[{fname, data}, 
    fname = "cached-data.mx";
    If[FileExistsQ[fname], 
        Get[fname],
        data = Evaluate[timeConsumingOperation[]];
        Put[data, fname];
        data]
];

timeConsumingOperation[]:=Module[{},
    (* lot of work here *)
    {"data"}
];

However, instead of writing the long data set to the file, the Put command only writes one line: "timeConsumingOperation[]", even if I wrap it with Evaluate as above. (To be true, this behaviour is not consistent, sometimes the dataset is written, sometimes not.)

How do you cache your data?

+2  A: 

In the past, whenever I've had trouble with things evaluating it is usually when I have not correctly matched the pattern required by the function. For instance,

f[x_Integers]:= x

which won't match anything. Instead, I meant

f[x_Integer]:=x

In your case, though, you have no pattern to match: timeConsumingOperation[].

You're problem is more likely related to when timeConsumingOperation is defined relative to myDataset. In the code you've posted above, timeConsumingOperation is defined after myDataset. So, on the first run (or immediately after you've cleared the global variables) you would get exactly the result you're describing because timeConsumingOperation is not defined when the code for myDataset is run.

Now, SetDelayed (:=) automatically causes the variable to be recalculated whenever it is used, and since you do not require any parameters to be passed, the square brackets are not necessary. The important point here is that timeConsumingOperation can be declared, as written, prior to myDataset because SetDelayed will cause it not to be executed until it is used.

All told, your caching methodology looks exactly how I would go about it.

rcollyer
That seems to solve my problem. Thanks!
Karsten W.
+3  A: 

Another caching technique I use very often, especially when you might not want to insert the precomputed form in e.g. a package, is to memoize the expensive evaluation(s), such that it is computed on first use but then cached for subsequent evaluations. This is readily accomplished with SetDelayed and Set in concert:

f[arg1_, arg2_] := f[arg1, arg2] = someExpensiveThing[arg1, arg2]

Note that SetDelayed (:=) binds higher than Set (=), so the implied order of evaluation is the following, but you don't actually need the parens:

f[arg1_, arg2_] := ( f[arg1, arg2] = someExpensiveThing[arg1, arg2])

Thus, the first time you evaluate f[1,2], the evaluation-delayed RHS is evaluated, causing resulting value is computed and stored as an OwnValue of f[1,2] with Set.

@rcollyer is also right in that you don't need to use empty brackets if you have no arguments, you could just as easily write:

g := g = someExpensiveThing[...]

There's no harm in using them, though.

Michael Pilat
@Michael, memoization is definitely useful, and could easily be coupled with the type of caching the OP is looking for. As far as not needing the empty brackets, I can see a good use for them: letting the maintenance programmer know that it is a calculated value. Otherwise, you might to do some hunting to be able to tell that it recalculates every time it is called.
rcollyer