tags:

views:

132

answers:

3

This is (the greatest part of) the cost function for a genetic optimization, so it needs to be really fast. Right now it's painfully slow even for toy problem sizes. I'm pretty sure there are faster ways to do many of the operations in this code, but I'm not that good at tuning R code. Can anyone suggest anything?

Fb, Ft, and Fi are scalar constants. tpat is a large, constant 2D matrix. jpat is a smaller matrix which is being optimized. nrow(tpat) == nrow(jpat) and ncol(tpat) %% ncol(jpat) == 0 are invariants. All entries in tpat and jpat are real numbers in [0,1].

# Toy jamming model for genetic optimization.
#
# A jamming pattern is a vector of real numbers \in [0,1], interpreted
# as a matrix with subcarrier frequency bands in the rows and time
# slots in the columns.  Every time slot, the jammer transmits
# Gaussian noise centered on the subcarrier frequency (todo: should
# that be Gaussian baseband noise *modulated* onto the subcarrier
# frequency?) with intensity equal to the number in the appropriate
# matrix cell (0 = off, 1 = maximum power).
#
# A transmission pattern is similar, but there are many more time
# slots; the jamming pattern is repeated horizontally to cover the
# complete transmission pattern.  (todo: implement jamming duty cycles.)
#
# The transmitter is required to transmit complete packets of some
# fixed length equal to several time slots, and it uses a fixed
# intensity Itr < 1 for each packet (we assume that the jammer is in
# between the transmitter and receiver, so its effective power at the
# receiver is higher).

Itr <- 0.75;
Fb  <- 0.1;
Ft  <- 0.1;
Fi  <- 0.5;

Nb  <- 100;
Sj  <- 30;
St  <- Sj * 20;

# success metric
pkt.matrix <- function(tpat) {
  # Find all the packets in tpat.  A packet is a contiguous sequence
  # of timeslots during which the transmitter was active on at least
  # one frequency band.  Returns a logical matrix with
  # nrow=(number of packets), ncol=(total timeslots), in which each
  # row will select one packet from the original matrix.
  runs <- rle(ifelse(apply(tpat, 2, sum) > 0, TRUE, FALSE));
  pkt  <- matrix(FALSE, nrow=sum(runs$values == TRUE),
                 ncol=sum(runs$lengths));
  i <- 1
  j <- 1
  for (r in 1:length(runs$lengths)) {
    if (runs$values[r]) {
      pkt[i, j:(runs$lengths[r]+j-1)] <- TRUE;
      i <- i + 1;
    }
    j <- j + runs$lengths[r];
  }
  return(pkt);
}

success.metric <- function(jpat, tpat) {
  if (ncol(tpat) %% ncol(jpat)) error("non-conformable arrays");
  if (ncol(tpat) > ncol(jpat))
    # there must be a better way to do this...
    jpat <- do.call(cbind, rep(alist(jpat), ncol(tpat)/ncol(jpat)));

  pktm <- pkt.matrix(tpat);
  pkts <- nrow(pktm);
  jammed <- 0;
  for (i in 1:pkts) {
    pkt <- tpat[,pktm[i,]];
    jam <- jpat[,pktm[i,]];

    # jamming on a channel not being used by the transmitter at the time
    # is totally ineffective
    jam[pkt==0] <- 0;

    # at least Ft of the time slots used by `pkt` must have had at least
    # one channel jammed
    if (sum(apply(jam, 2, sum) > 0) < Ft * ncol(pkt)) next;

    # at least Fb of the time slots used by `pkt` must have been jammed
    # at least once
    if (sum(apply(jam, 1, sum) > 0) < Fb * nrow(pkt)) next;

    # the total intensity produced by the jammer must be at least Fi the
    # total intensity produced by the source
    if (sum(jam) < Fi * sum(pkt)) next;

    jammed <- jammed + 1;
  }
  return((pkts - jammed) * 100 / pkts);
}

# some `tpat` examples; `jpat` is generated by genoud()
## saturation transmission: on for 19, off for 1
sat.base <- c(rep(Itr, 19), 0);
### single constant subcarrier
sat.scs <- matrix(0, nrow=Nb, ncol=St);
sat.scs[Nb/2,] <- sat.base;

### FHSS with an incredibly foolish hopping pattern
sat.fhss <- matrix(0, nrow=Nb, ncol=St);
# razzum frazzum 1-based arrays
sat.fhss[((col(sat.fhss) - 1) %% nrow(sat.fhss)) + 1 == 
         row(sat.fhss)] <- sat.base;
+2  A: 

Lots of loops, lots of sweeping along arrays, very few statistical functions... I'd rewrite it in C.

Keep your slow R version for checking, and rewrite this in C. Make sure your R and C give the same values for test data sets.

Oh, but first profile everything to make sure its this bit that is slow - it certainly looks like a prime candidate.

Spacedman
Spacedman is correct here. If this is as r-like as you can code then do it in C. However, perhaps you have quite a bit of code around this that's best expressed in R. In that case you need to reconsider how you're doing things a lot.
John
It certainly _seems_ like it could be more R-like, but I don't know R comprehensively so I don't know how to make it faster. I'm using the `rgenoud` package for the actual genetic optimization; I suppose I could make this a `.C()` subroutine but this was the way to get things going quickly.
Zack
++ Right on. R is used for ease of trying out ideas, not for speed of execution. To do the profiling part, it's very simple. Just hit the Escape key and display the call stack a few times. If that code is typically on the stack, then that's what needs to be recoded in C.
Mike Dunlavey
Nearly a week later, I messed around with this at length and did, in fact, end up rewriting it in C. So you get the check mark. (And now I have to start over, because now that it's not taking an hour per generation, `rgenoud` has a chance to eat all my RAM and crash. Need to find a representation of the function under optimization that doesn't involve a 1500-element matrix...)
Zack
+1  A: 

Maybe post a question about your pkt.matrix function by itself (seems like bad R code). That might be something that you can provide toy sample data for and give a simple description of. In fact, as near as I can tell, you'd be better off if that were a list. Do you really want it symmetric on every row? If packets are ragged then just make a list of packets. It's easier and will work faster.

Isn't jam just a vector? If so then "sum(apply(jam, 2, sum) > 0)" is sort of nonsense. It should just be sum(jam).

John
I've updated the question with an explanation of the data structures and enough additional code that it should be possible to play with. It really doesn't make sense to make the packets be a list, that would lose information about timing that the full version of this would need. And jam is a matrix, not a vector.
Zack
+2  A: 

One thing that will help is to replace:

runs <- rle(ifelse(apply(tpat, 2, sum) > 0, TRUE, FALSE))  # replace this
runs <- rle(colSums(tpat) > 0)  # with this

and generally replace apply(foo, 2, sum) with colSums(foo) and apply(foo, 1, sum) with rowSums(foo).

EDIT: Here's an updated version of pkt.matrix. Nothing stunning, but it's quite a bit faster.

pkt.matrix <- function(tpat) {
  runs <- rle(colSums(tpat) > 0);
  pkt  <- matrix(FALSE, nrow=sum(runs$values),
                 ncol=sum(runs$lengths));

  endpts <- cumsum(runs$lengths)[runs$values]
  begpts <- endpts-runs$lengths[runs$values]+1

  for(i in 1:NROW(pkt)) {
    #pkt[i,seq(begpts[i],endpts[i])] <- TRUE
    pkt[i,begpts[i]:endpts[i]] <- TRUE  # eyjo's suggestion
  }

  return(pkt);
}

> # Times on my machine:
> # Original
> system.time( for(i in 1:1e4) pktm <- pkt.matrix(sat.fhss) )
   user  system elapsed 
  68.21    0.23   68.50
> # Updated
> system.time( for(i in 1:1e4) pktm <- pkt.matrix(sat.fhss) )
   user  system elapsed 
   4.28    0.00    4.28
Joshua Ulrich
thanks, I'll try that (didn't know about *Sums).
Zack
Change the "seq(begpts[i],endpts[i]) into just "begpts[i]:endpts[i]", one less function call (and 2.5 times faster) ...
eyjo