tags:

views:

400

answers:

3

What is the most efficient way to make a matrix of lagged variables in R for an arbitrary variable (i.e. not a regular time series)

for example: input: x <- c(1,2,3,4) 2 lags output: [1,NA, NA] [2, 1, NA] [3, 2, 1] [4, 3, 2]

+1  A: 

Use a proper class for your objects; base R has ts which has a lag() function to operate on. Note that these ts objects came from a time when 'delta' or 'frequency' where constant: monthly or quarterly data as in macroeconomic series.

For irregular data such as (business-)daily, use the zoo or xts packages which can also deal (very well!) with lags. To go further from there, you can use packages like dynlm or dlm allow for dynamic regression models with lags.

The Task Views on Time Series, Econometrics, Finance all have further pointers.

Dirk Eddelbuettel
+1  A: 

The running function in the gtools package does more or less what you want:

> require("gtools")
> running(1:4, fun=I, width=3, allow.fewer=TRUE)

$`1:1`
[1] 1

$`1:2` 
[1] 1 2

$`1:3` 
[1] 1 2 3

$`2:4` 
[1] 2 3 4
Jonathan Chang
Glad to see you're finally on here!
Christopher DuBois
But James wanted a matrix not a list. You could package the result using matrix(unlist(...)) but the embed() function does it in one step.
Rob Hyndman
Totally right, which is why I upvoted the embed() solution when it came out =). But 'running' is still a useful function I think --- most of the time when I wanted to create the matrix James asked for, what I really wanted to do was run apply on it.
Jonathan Chang
+3  A: 

You can achieve this using the built-in embed() function, where its second 'dimension' argument is equivalent to what you've called 'lag':

x <- c(NA,NA,1,2,3,4)
embed(x,3)

## returns
     [,1] [,2] [,3]
[1,]    1   NA   NA
[2,]    2    1   NA
[3,]    3    2    1
[4,]    4    3    2

embed() was discussed in a previous answer by Joshua Reich. (Note that I prepended x with NAs to replicate your desired output).

It's not particularly well-named but it is quite useful and powerful for operations involving sliding windows, such as rolling sums and moving averages.

dataspora
More generally:lagmatrix <- function(x,max.lag){embed(c(rep(NA,max.lag),x),max.lag+1)}Then use lagmatrix(1:4,2)
Rob Hyndman
Thanks for the pointer to the embed function. This saved a massive amount of computing time for me.
James