tags:

views:

194

answers:

8

Hello

Imagine I have a vector with ones and zeroes

I write it compactly:

1111111100001111111111110000000001111111111100101

I need to get a new vector replacing the "N" ones following the zeroes to new zeroes.

For example for N = 3.

1111111100001111111111110000000001111111111100101 becomes 1111111100000001111111110000000000001111111100000

I can do it with a for loop but I've read is not a good practice, How can I do it then?

cheers

My vector is a zoo series, indeed, but I guess it doesn't make any difference. If I wanted zeroes up to end I would use cumprod.

+1  A: 

Here is one way:

> tmp <- strsplit('1111111100001111111111110000000001111111111100101','')
> tmp <- as.numeric(unlist(tmp))
> 
> n <- 3
> 
> tmp2 <- embed(tmp, n+1)
> 
> tmp3 <- tmp
> tmp3[ which( apply( tmp2, 1, function(x) any(x==0) ) ) + n ] <- 0
> 
> paste(tmp3, collapse='')
[1] "1111111100000001111111110000000000001111111100000"

whether this is better than a loop or not is up to you.

This will also not change the 1st n elements if there is a 0 there.

here is another way:

> library(gtools)
> 
> tmpfun <- function(x) {
+ if(any(x==0)) {
+ 0
+ } else {
+ x[length(x)]
+ }
+ }
> 
> tmp4 <- running( tmp, width=4, fun=tmpfun, 
+ allow.fewer=TRUE )
> 
> tmp4 <- unlist(tmp4)
> paste(tmp4, collapse='')
[1] "1111111100000001111111110000000000001111111100000"
> 
Greg Snow
+4  A: 

You can also do this with rle. All you need to do is add n to all the lengths where the value is 0 and subtract n when the value is 1 (being a little bit careful when there are less than n ones in a row). (Using Greg's method to construct the sample)

rr <- rle(tmp)
## Pad so that it always begins with 1 and ends with 1
if (rr$values[1] == 0) {
   rr$values <- c(1, rr$values)
   rr$lengths <- c(0, rr$lengths)  
}
if (rr$values[length(rr$values)] == 0) {
  rr$values <- c(rr$values, 1)
  rr$lengths <- c(rr$lengths, 0)  
}
zero.indices <- seq(from=2, to=length(rr$values), by=2)
one.indices <- seq(from=3, to=length(rr$values), by=2)
rr$lengths[zero.indices] <- rr$lengths[zero.indices] + pmin(rr$lengths[one.indices], n)
rr$lengths[one.indices] <- pmax(0, rr$lengths[one.indices] - n)
inverse.rle(rr)
Jonathan Chang
Why do you change the last zero ??I thought I could do it easier, your answer is quite complicated.You can read a different approach herehttp://r.789695.n4.nabble.com/adding-zeroes-after-old-zeroes-in-a-vector-td2534824.html#a2535017but it doesn't work as expectedthanks
My vector always starts by 1.I also could try by shifting the elements of the vector one position and ANDing the result with the original. And again shifting 2 postions and so on up to N. But it's very slow.I found a faster way, shifting one position, then 2, then 4, then 8.... and doing ANDs
If you know your vector starts with one you could get rid of the first if. You need the second if because the subsequent lines essentially has each 0 sequence look ahead to the next 1 sequence, which will fail if there is no trailing 1 sequence.
Jonathan Chang
+1  A: 

To follow up on my previous comment, if speed is in fact a concern - converting the vector to a string and using regex may well be faster than other solutions. First a function:

replaceZero <- function(x,n){
    x <- gsub(paste("01.{",n-1,"}", sep = "") , paste(rep(0,n+1),collapse = ""), x)
}

Generate data

z <- sample(0:1, 1000000, replace = TRUE)

z <- paste(z, collapse="")
repz <- replaceZero(z,3)
repz <- as.numeric(unlist(strsplit(repz, "")))

System time to collapse, run regex, and split back into vector:

Regex method
   user  system elapsed 
   2.39    0.04    2.39 
Greg's method
   user  system elapsed 
   17.m39    0.17   18.30
Jonathon's method
   user  system elapsed 
   2.47    0.02    2.31 
Chase
Hello chaseI've tried your solution but it doesn't work well.Jonathan's one does.
@user425895 - what doesn't work the way you expected it to? Doesn't give you the answer you want? Takes too long? Doesn't feel right when you press the keys? Doesn't work well isn't very helpful and if there is something wrong with the code - knowing why it doesn't produce the results you want will let me fix it so that it does - and those that come along with similar questions can have access to code snippets that work..."doesn't work well" will not get anyone closer to that goal.
Chase
@Chase If you use this vector, the result is not correct x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,1)
gd047
+2  A: 
x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,1)

n <- 3
z<-rle(x)
tmp <- cumsum(z$lengths)

for (i in seq(which.min(z$values),max(which(z$values==1)),2)) {
         if  (z$lengths[i+1] < n)   x[tmp[i]:(tmp[i] + z$lengths[i+1])] <- 0
         else                       x[tmp[i]:(tmp[i]+n)] <- 0
}
gd047
+2  A: 

How about just looping through the (assuming few) N instances:

addZeros <- function(x, N = 3) {
    xx <- x
    z <- x - 1
    for (i in 1:N) {
        xx <- xx + c(rep(0, i), z[-c((NROW(x) - i + 1):NROW(x))])
    }
    xx[xx<0] <- 0
    xx
}

Simply turns all zero instances into -1 in order to subtract the N succeeding values.

> x <- c(1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,1)
> x
 [1] 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1
[39] 1 1 1 1 1 1 0 0 1 0 1
> addZeros(x)
 [1] 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
[39] 1 1 1 1 1 1 0 0 0 0 0

EDIT:

After reading your description of the data in the R-help mailing list, this clearly is not a case of small N. Hence, you might want to consider a C function for this.

In the file "addZeros.c":

void addZeros(int *x, int *N, int *n)
{
    int i, j;

    for (i = *n - 1; i > 0; i--)
    {
        if ((x[i - 1] == 0) && (x[i] == 1))
        {
            j = 0;
            while ((j < *N) && (i + j < *n) && (x[i + j] == 1))
            {
                x[i + j] = 0;
                j++;
            }
        }
    }
}

In command prompt (MS DOS in Windows, press Win+r and write cmd), write "R CMD SHLIB addZeros.c". If the path to R is not attainable (i.e. "unknown kommand R") you need to state full address (on my system:

"c:\Program Files\R\R-2.10.1\bin\R.exe" CMD SHLIB addZeros.c

On Windows this should produce a DLL (.so in Linux), but if you do not already have the R-toolbox you should download and install it (it is a collection of tools, such as Perl and Mingw). Download the newest version from http://www.murdoch-sutherland.com/Rtools/

The R wrapper function for this would be:

addZeros2 <- function(x, N) {
    if (!is.loaded("addZeros"))
        dyn.load(file.path(paste("addZeros", .Platform$dynlib.ext, sep = "")))
    .C("addZeros",
        x = as.integer(x),
        as.integer(N),
        as.integer(NROW(x)))$x
}

Note that the working directory in R should be the same as the DLL (on my system setwd("C:/Users/eyjo/Documents/Forrit/R/addZeros")) before the addZeros R function is called the first time (alternatively, in dyn.load just include the full path to the dll file). It is good practice to keep these in a sub-directory under the project (i.e. "c"), then just add "c/" in front of "addZeros" in the file path.

To illustrate:

> x <- rbinom(1000000, 1, 0.9)
>
> system.time(addZeros(x, 10))
   user  system elapsed 
   0.45    0.14    0.59 
> system.time(addZeros(x, 400))
   user  system elapsed 
  15.87    3.70   19.64 
> 
> system.time(addZeros2(x, 10))
   user  system elapsed 
   0.01    0.02    0.03 
> system.time(addZeros2(x, 400))
   user  system elapsed 
   0.03    0.00    0.03 
> 

Where the "addZeros" is my original suggestion with just internal R, and addZeros2 is using the C function.

eyjo
I like to see the different creative ways you all do it.
Hi. How do I compile it on Windows?
I added more explanations. You should install the tool-box: http://www.murdoch-sutherland.com/Rtools/
eyjo
+1  A: 

I really like the idea of using a "regular expression" for this so I gave a vote up for that. (Wish I had gotten an rle answer in too and learned something from the embed and running answers. Neat!) Here's a variation on Chase's answer that I think may address the issues raised:

replaceZero2 <- function(x, n) {
  if (n == 0) {
    return(x)
  }
  xString <- paste(x, collapse="")
  result <- gsub(paste("(?<=",
             paste("01{", 0:(n - 1), "}", sep="", collapse="|"),
             ")1", sep=""),
       "0", xString, perl=TRUE)
  return(as.numeric(unlist(strsplit(result, ""))))
}

This seems to produce identical results to Chang's rle method for n = 1,2,3,4,5 on gd047's example input.

Maybe you could write this more cleanly using \K?

David F
+1 This is working. I also liked the idea of using a "regular expression". Nevertheless, Jonathan's idea is even better (and faster).
gd047
A: 

I've found a solution myself. I think it's very easy and not very slow. I guess if someone could compile it in C++ it would be very fast because it has just one loop.

f5 <- function(z, N) {
   x <- z
   count <- 0
   for (i in 1:length(z)) {
     if (z[i]==0) { count <- N }
     else {
       if (count >0) { 
          x[i] <- 0  
          count <- count-1 }
   }
}
x
}
A: 

Using a moving minimum function is very fast, simple, and not dependent on the distribution of spans:

x <- rbinom(1000000, 1, 0.9)
system.time(movmin(x, 3, na.rm=T))
# user  system elapsed 
# 0.11    0.02    0.13 

The following simple definition of movmin suffices (the complete function has some functionality superfluous to this case, such as using the van Herk/Gil-Werman algorithm for large N)

movmin = function(x, n, na.rm=F) {
  x = c(rep.int(NA, n - 1), x) # left pad
  do.call(pmin, c(lapply(1:n, function(i) x[i:(length(x) - n + i)]), na.rm=na.rm))
}

Actually you need a window size of 4 because you affect the 3 values following a zero. This matches your f5:

x <- rbinom(1000000, 1, 0.9)
all.equal(f5(x, 3), movmin(x, 4, na.rm=T))
# [1] TRUE
Charles
It's very fast but it doesn't give the proper answer
Ah yes, I see the difference - the window size needs expanding (see addendum above).
Charles