views:

179

answers:

4

Hi all,

I'm tying to execute an R script from python, ideally displaying and saving the results. Using rpy2 has been a bit of a struggle, so I thought I'd just call R directly. I have a feeling that I'll need to use something like "os.system" or "subprocess.call," but I am having difficulty deciphering the module guides.

Here's the R script "MantelScript", which uses a particular stat test to compare two distance matrices at a time (distmatA1 and distmatB1). This works in R, though I haven't yet put in the iterating bits in order to read through and compare a bunch of files in a pairwise fashion (I really need some assistance with this, too btw!):

library(ade4)

M1<-read.table("C:\\pythonscripts\\distmatA1.csv", header = FALSE, sep = ",")
M2<-read.table("C:\\pythonscripts\\distmatB1.csv", header = FALSE, sep = ",")

mantel.rtest(dist(matrix(M1, 14, 14)), dist(matrix(M2, 14, 14)), nrepet = 999)

Here's the relevant bit of my python script, which reads through some previously formulated lists and pulls out matrices in order to compare them via this Mantel Test (it should pull the first matrix from identityA and sequentially compare it to every matrix in identityB, then repeat with the second matrix from identityB etc). I want to save these files and then call on the R program to compare them:

# windownA and windownB are lists containing ascending sequences of integers
# identityA and identityB are lists where each field is a distance matrix.

z = 0
v = 0

import subprocess
import os

for i in windownA:                              

    M1 = identityA[i]                          

    z += 1
    filename = "C:/pythonscripts/distmatA"+str(z)+".csv"
    file = csv.writer(open(filename, 'w'))
    file.writerow(M1)


    for j in windownB:                          

        M2 = identityB[j]                     

        v += 1
        filename2 = "C:/pythonscripts/distmatB"+str(v)+".csv"
        file = csv.writer(open(filename2, 'w'))
        file.writerow(M2)

        ## result = os.system('R CMD BATCH C:/R/library/MantelScript.R') - maybe something like this??

        ## result = subprocess.call(['C:/R/library/MantelScript.txt'])  - or maybe this??

        print result
        print ' '
A: 

Stick with this.

process = subprocess.Popen(['R', 'CMD', 'BATCH', 'C:/R/library/MantelScript.R'])
process.wait()

When the the wait() function returns a value the .R file is finished.

Note that you should write your .R script to produce a file that your Python program can read.

with open( 'the_output_from_mantelscript', 'r' ) as result:
    for line in result:
        print( line )

Don't waste a lot of time trying to hook up a pipeline.

Invest time in getting a basic "Python spawns R" process working.

You can add to this later.

S.Lott
thanks for the reply. I chose for the time being to write a script up in R and just execute it after saving .csv files with python, given that I had a time constraint. still have one issue left with that code, but (as a "layman") found it easier to work with.
@vehicularlambslaughter.myopenid.c: "found it easier to work with" That's my opinion, also. Simple pipelines where one program writes a file and another program reads a file are best. Fooling around too much with `subprocess` doesn't often help.
S.Lott
+1  A: 

If your R script only has side effects that's fine, but if you want to process further the results with Python, you'll still be better of using rpy2.

import rpy2.robjects
f = file("C:/R/library/MantelScript.R")
code = ''.join(f.readlines())
result = rpy2.robjects.r(code)
# assume that MantelScript creates a variable "X" in the R GlobalEnv workspace
X = rpy2.rojects.globalenv['X']
lgautier
I think that rpy2 would have made things run more smoothly, but I decided yesterday to stick with the more cumbersome, yet straightforward approach of executing a script in R directly
@vehicularlambslaughter : this may limit the frustration of having to work with a deprecated package; you are with MSWindows, and rpy2 on that OS is one major release behind (2.0.x vs 2.1.x on UNIX-alike). If you keep things modular you will always be able to switch back easily if the need arises (or a more recent rpy2 finally comes to to MSWindows).
lgautier
A: 

Given what you're trying to do, a pure R solution might be neater:

file.pairs <- combn(dir(pattern="*.csv"), 2) # get every pair of csv files in the current dir

The pairs are columns in a 2xN matrix:

file.pairs[,1]
[1] "distmatrix1.csv" "distmatrix2.csv"

You can run a function on these columns by using apply (with option '2', meaning 'act over columns'):

my.func <- function(v) paste(v[1], v[2], sep="::")
apply(file.pairs, 2, my.func)

In this example my.func just glues the two file names together; you could replace this with a function that does the Mantel Test, something like (untested):

my.func <- function(v){
  M1<-read.table(v[1], header = FALSE, sep = ",")
  M2<-read.table(v[2], header = FALSE, sep = ",")
  mantel.rtest(dist(matrix(M1, 14, 14)), dist(matrix(M2, 14, 14)), nrepet = 999)
}
Michael Dunn
ah thanks this would have been much neater than the script I just wrote in R found here: http://stackoverflow.com/questions/3354115/writing-a-rtest-output-to-file-using-the-r-program-ex-via-write-tablei need to read in every combination of files from groups "A" and "B"
You're welcome! My personal feeling on `rpy2` is that it's a specialist tool for people who are experts in *both* python and R. If you're a little bit uncertain in either of those two languages then rpy2 acts as a multiplier of your problems.
Michael Dunn
@Michael Dunn : Although rpy2 is by definition a specialist tool (as much as R is a specialist tool in fact), it should not require expertise in R (knowledge in programming with Python will be unavoidable, unfortunately). There is a growing documentation for rpy2, and if you have experiences that match your personal feeling please report them (rpy2 bug tracker, rpy mailing list, etc...) so they are better known to developers.
lgautier
@lgautier: rpy2 is brilliant, and I certainly don't have anything I'd want to submit as a bug report or complain about on the mailing list. But it's at its best when playing to the strengths of both tools e.g. to scrape data from the web and then produce statistical graphics from it. It's a heavyweight solution to basic file manipulation (as in this question); I think a beginner R user would find learning to do basic tasks purely in R to be both simpler and of more general value.
Michael Dunn
+2  A: 

In case you're interested in generally invoking an R subprocess from Python.

#!/usr/bin/env python3

from io import StringIO
from subprocess import PIPE, Popen

def rnorm(n):
    rscript = Popen(["Rscript", "-"], stdin=PIPE, stdout=PIPE, stderr=PIPE)
    with StringIO() as s:
        s.write("x <- rnorm({})\n".format(n))
        s.write("cat(x, \"\\n\")\n")
        return rscript.communicate(s.getvalue().encode())

if __name__ == '__main__':
    output, errmsg = rnorm(5)
    print("stdout:")
    print(output.decode('utf-8').strip())
    print("stderr:")
    print(errmsg.decode('utf-8').strip())

Better to do it through Rscript.

Alan Lue