ansaurus

Question

R: Adding zeroes after old zeroes in a vector ??

Answer 1

+1 A:

Here is one way:

> tmp <- strsplit('1111111100001111111111110000000001111111111100101','')
> tmp <- as.numeric(unlist(tmp))
> 
> n <- 3
> 
> tmp2 <- embed(tmp, n+1)
> 
> tmp3 <- tmp
> tmp3[ which( apply( tmp2, 1, function(x) any(x==0) ) ) + n ] <- 0
> 
> paste(tmp3, collapse='')
[1] "1111111100000001111111110000000000001111111100000"

whether this is better than a loop or not is up to you.

This will also not change the 1st n elements if there is a 0 there.

here is another way:

> library(gtools)
> 
> tmpfun <- function(x) {
+ if(any(x==0)) {
+ 0
+ } else {
+ x[length(x)]
+ }
+ }
> 
> tmp4 <- running( tmp, width=4, fun=tmpfun, 
+ allow.fewer=TRUE )
> 
> tmp4 <- unlist(tmp4)
> paste(tmp4, collapse='')
[1] "1111111100000001111111110000000000001111111100000"
>

Greg Snow 2010-09-10 18:41:16

Answer 2

+4 A:

You can also do this with rle. All you need to do is add n to all the lengths where the value is 0 and subtract n when the value is 1 (being a little bit careful when there are less than n ones in a row). (Using Greg's method to construct the sample)

rr <- rle(tmp)
## Pad so that it always begins with 1 and ends with 1
if (rr$values[1] == 0) {
   rr$values <- c(1, rr$values)
   rr$lengths <- c(0, rr$lengths)  
}
if (rr$values[length(rr$values)] == 0) {
  rr$values <- c(rr$values, 1)
  rr$lengths <- c(rr$lengths, 0)  
}
zero.indices <- seq(from=2, to=length(rr$values), by=2)
one.indices <- seq(from=3, to=length(rr$values), by=2)
rr$lengths[zero.indices] <- rr$lengths[zero.indices] + pmin(rr$lengths[one.indices], n)
rr$lengths[one.indices] <- pmax(0, rr$lengths[one.indices] - n)
inverse.rle(rr)

Jonathan Chang 2010-09-10 19:21:29

Why do you change the last zero ??I thought I could do it easier, your answer is quite complicated.You can read a different approach herehttp://r.789695.n4.nabble.com/adding-zeroes-after-old-zeroes-in-a-vector-td2534824.html#a2535017but it doesn't work as expectedthanks

2010-09-10 20:07:38

My vector always starts by 1.I also could try by shifting the elements of the vector one position and ANDing the result with the original. And again shifting 2 postions and so on up to N. But it's very slow.I found a faster way, shifting one position, then 2, then 4, then 8.... and doing ANDs

2010-09-10 20:10:54

If you know your vector starts with one you could get rid of the first if. You need the second if because the subsequent lines essentially has each 0 sequence look ahead to the next 1 sequence, which will fail if there is no trailing 1 sequence.

Jonathan Chang 2010-09-11 06:28:03

Answer 3

+1 A:

To follow up on my previous comment, if speed is in fact a concern - converting the vector to a string and using regex may well be faster than other solutions. First a function:

replaceZero <- function(x,n){
    x <- gsub(paste("01.{",n-1,"}", sep = "") , paste(rep(0,n+1),collapse = ""), x)
}

Generate data

z <- sample(0:1, 1000000, replace = TRUE)

z <- paste(z, collapse="")
repz <- replaceZero(z,3)
repz <- as.numeric(unlist(strsplit(repz, "")))

System time to collapse, run regex, and split back into vector:

Regex method
   user  system elapsed 
   2.39    0.04    2.39 
Greg's method
   user  system elapsed 
   17.m39    0.17   18.30
Jonathon's method
   user  system elapsed 
   2.47    0.02    2.31

Chase 2010-09-10 20:16:13

Hello chaseI've tried your solution but it doesn't work well.Jonathan's one does.

2010-09-10 20:59:46

@user425895 - what doesn't work the way you expected it to? Doesn't give you the answer you want? Takes too long? Doesn't feel right when you press the keys? Doesn't work well isn't very helpful and if there is something wrong with the code - knowing why it doesn't produce the results you want will let me fix it so that it does - and those that come along with similar questions can have access to code snippets that work..."doesn't work well" will not get anyone closer to that goal.

Chase 2010-09-11 00:55:05

@Chase If you use this vector, the result is not correct x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,1)

gd047 2010-09-11 12:08:38

Answer 4

+2 A:

x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,1)

n <- 3
z<-rle(x)
tmp <- cumsum(z$lengths)

for (i in seq(which.min(z$values),max(which(z$values==1)),2)) {
         if  (z$lengths[i+1] < n)   x[tmp[i]:(tmp[i] + z$lengths[i+1])] <- 0
         else                       x[tmp[i]:(tmp[i]+n)] <- 0
}

gd047 2010-09-10 21:33:28

Answer 5

+2 A:

How about just looping through the (assuming few) N instances:

addZeros <- function(x, N = 3) {
    xx <- x
    z <- x - 1
    for (i in 1:N) {
        xx <- xx + c(rep(0, i), z[-c((NROW(x) - i + 1):NROW(x))])
    }
    xx[xx<0] <- 0
    xx
}

Simply turns all zero instances into -1 in order to subtract the N succeeding values.

> x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1)
> x
 [1] 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1
[39] 1 1 1 1 1 1 0 0 1 0 1
> addZeros(x)
 [1] 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
[39] 1 1 1 1 1 1 0 0 0 0 0

EDIT:

After reading your description of the data in the R-help mailing list, this clearly is not a case of small N. Hence, you might want to consider a C function for this.

In the file "addZeros.c":

void addZeros(int *x, int *N, int *n)
{
    int i, j;

    for (i = *n - 1; i > 0; i--)
    {
        if ((x[i - 1] == 0) && (x[i] == 1))
        {
            j = 0;
            while ((j < *N) && (i + j < *n) && (x[i + j] == 1))
            {
                x[i + j] = 0;
                j++;
            }
        }
    }
}

In command prompt (MS DOS in Windows, press Win+r and write cmd), write "R CMD SHLIB addZeros.c". If the path to R is not attainable (i.e. "unknown kommand R") you need to state full address (on my system:

"c:\Program Files\R\R-2.10.1\bin\R.exe" CMD SHLIB addZeros.c

On Windows this should produce a DLL (.so in Linux), but if you do not already have the R-toolbox you should download and install it (it is a collection of tools, such as Perl and Mingw). Download the newest version from http://www.murdoch-sutherland.com/Rtools/

The R wrapper function for this would be:

addZeros2 <- function(x, N) {
    if (!is.loaded("addZeros"))
        dyn.load(file.path(paste("addZeros", .Platform$dynlib.ext, sep = "")))
    .C("addZeros",
        x = as.integer(x),
        as.integer(N),
        as.integer(NROW(x)))$x
}

Note that the working directory in R should be the same as the DLL (on my system setwd("C:/Users/eyjo/Documents/Forrit/R/addZeros")) before the addZeros R function is called the first time (alternatively, in dyn.load just include the full path to the dll file). It is good practice to keep these in a sub-directory under the project (i.e. "c"), then just add "c/" in front of "addZeros" in the file path.

To illustrate:

> x <- rbinom(1000000, 1, 0.9)
>
> system.time(addZeros(x, 10))
   user  system elapsed 
   0.45    0.14    0.59 
> system.time(addZeros(x, 400))
   user  system elapsed 
  15.87    3.70   19.64 
> 
> system.time(addZeros2(x, 10))
   user  system elapsed 
   0.01    0.02    0.03 
> system.time(addZeros2(x, 400))
   user  system elapsed 
   0.03    0.00    0.03 
>

Where the "addZeros" is my original suggestion with just internal R, and addZeros2 is using the C function.

eyjo 2010-09-11 01:01:38

I like to see the different creative ways you all do it.

2010-09-11 10:35:25

Hi. How do I compile it on Windows?

2010-09-11 22:17:33

I added more explanations. You should install the tool-box: http://www.murdoch-sutherland.com/Rtools/

eyjo 2010-09-13 11:40:32

Answer 6

+1 A:

I really like the idea of using a "regular expression" for this so I gave a vote up for that. (Wish I had gotten an rle answer in too and learned something from the embed and running answers. Neat!) Here's a variation on Chase's answer that I think may address the issues raised:

replaceZero2 <- function(x, n) {
  if (n == 0) {
    return(x)
  }
  xString <- paste(x, collapse="")
  result <- gsub(paste("(?<=",
             paste("01{", 0:(n - 1), "}", sep="", collapse="|"),
             ")1", sep=""),
       "0", xString, perl=TRUE)
  return(as.numeric(unlist(strsplit(result, ""))))
}

This seems to produce identical results to Chang's rle method for n = 1,2,3,4,5 on gd047's example input.

Maybe you could write this more cleanly using \K?

David F 2010-09-12 07:22:28

+1 This is working. I also liked the idea of using a "regular expression". Nevertheless, Jonathan's idea is even better (and faster).

gd047 2010-09-12 07:57:17

Answer 7

A:

I've found a solution myself. I think it's very easy and not very slow. I guess if someone could compile it in C++ it would be very fast because it has just one loop.

f5 <- function(z, N) {
   x <- z
   count <- 0
   for (i in 1:length(z)) {
     if (z[i]==0) { count <- N }
     else {
       if (count >0) { 
          x[i] <- 0  
          count <- count-1 }
   }
}
x
}

2010-09-12 13:45:59

Answer 8

A:

Using a moving minimum function is very fast, simple, and not dependent on the distribution of spans:

x <- rbinom(1000000, 1, 0.9)
system.time(movmin(x, 3, na.rm=T))
# user  system elapsed 
# 0.11    0.02    0.13

The following simple definition of movmin suffices (the complete function has some functionality superfluous to this case, such as using the van Herk/Gil-Werman algorithm for large N)

movmin = function(x, n, na.rm=F) {
  x = c(rep.int(NA, n - 1), x) # left pad
  do.call(pmin, c(lapply(1:n, function(i) x[i:(length(x) - n + i)]), na.rm=na.rm))
}

Actually you need a window size of 4 because you affect the 3 values following a zero. This matches your f5:

x <- rbinom(1000000, 1, 0.9)
all.equal(f5(x, 3), movmin(x, 4, na.rm=T))
# [1] TRUE

Charles 2010-09-14 19:56:27

It's very fast but it doesn't give the proper answer

2010-09-15 12:18:10

Ah yes, I see the difference - the window size needs expanding (see addendum above).

Charles 2010-09-20 05:11:52

ansaurus

tags:

views:

answers:

R: Adding zeroes after old zeroes in a vector ??

related questions