tags:

views:

80

answers:

1

I am having difficulty with:

  • Listing the R packages and functions available to PostgreSQL.
  • Installing a package (such as Kendall) for use with PL/R
  • Calling an R function within PostgreSQL

Listing Available R Packages

Q.1. How do you find out what R modules have been loaded?

SELECT * FROM r_typenames();

That shows the types that are available, but what about checking if Kendall( X, Y ) is loaded? For example, the documentation shows:

CREATE TABLE plr_modules (
  modseq int4,
  modsrc text
);

That seems to allow inserting records to dictate that Kendall is to be loaded, but the following code doesn't explain, syntactically, how to ensure that it gets loaded:

INSERT INTO plr_modules
  VALUES (0, 'pg.test.module.load <-function(msg) {print(msg)}');

Q.2. What would the above line look like if you were trying to load Kendall?
Q.3. Is it applicable?

Installing R Packages

Using the "synaptic" package manager the following packages have been installed:

r-base
r-base-core
r-base-dev
r-base-html
r-base-latex
r-cran-acepack
r-cran-boot
r-cran-car
r-cran-chron
r-cran-cluster
r-cran-codetools
r-cran-design
r-cran-foreign
r-cran-hmisc
r-cran-kernsmooth
r-cran-lattice
r-cran-matrix
r-cran-mgcv
r-cran-nlme
r-cran-quadprog
r-cran-robustbase
r-cran-rpart
r-cran-survival
r-cran-vr
r-recommended

Q.4. How do I know if Kendall is in there?
Q.5. If it isn't, how do I find out what package it is in?
Q.6. If it isn't in a package suitable for installing with apt-get (aptitude, synaptic, dpkg, what have you), how do I go about installing it on Ubuntu?
Q.7. Where are the installation steps documented?

Calling R Functions

I have the following code:

EXECUTE 'SELECT '
  'regr_slope( amount, year_taken ),'
  'regr_intercept( amount, year_taken ),'
  'corr( amount, year_taken ),'
  'sum( measurements ) AS total_measurements '
'FROM temp_regression'
INTO STRICT slope, intercept, correlation, total_measurements;

This code calls the PostgreSQL function corr to calculate Pearson's correlation over the data. Ideally, I'd like to do the following (by switching corr for plr_kendall):

EXECUTE 'SELECT '
  'regr_slope( amount, year_taken ),'
  'regr_intercept( amount, year_taken ),'
  'plr_kendall( amount, year_taken ),'
  'sum( measurements ) AS total_measurements '
'FROM temp_regression'
INTO STRICT slope, intercept, correlation, total_measurements;

Q.8. Do I have to write plr_kendall myself?
Q.9. Where can I find a simple example that walks through:

  1. Loading an R module into PG.
  2. Writing a PG wrapper for the desired R function.
  3. Calling the PG wrapper from a SELECT.

For example, would the last two steps look like:

create or replace function plr_kendall( _float8, _float8 ) returns float as '
  agg_kendall(arg1, arg2)
' language 'plr';

CREATE AGGREGATE agg_kendall (
  sfunc = plr_array_accum,
  basetype = float8, -- ???
  stype = _float8, -- ???
  finalfunc = plr_kendall
);

And then the SELECT as above?

Thank you!

A: 

Overview

These steps list how to call an R function from PostgreSQL using PL/R.

Prerequisties

You must already have PostgreSQL, R, and PL/R installed.

Steps

  1. Find R Module name (e.g., Kendall)
  2. Change to the database user:
    sudo su - postgres
  3. Run R
    R
  4. Install R Module (accept $HOME/R/x86_64-pc-linux-gnu-library/2.9/):
    install.packages("Kendall", dependencies = TRUE)
  5. Choose a CRAN Mirror, when prompted.
  6. Create the following table:
    CREATE TABLE plr_modules (
    modseq int4,
    modsrc text
    );
  7. Insert into that table the directive to load the R Module in question:
    INSERT INTO plr_modules
    VALUES (0, 'library(Kendall)' );
  8. Restart the database (or SELECT * FROM reload_plr_modules();):
    sudo /etc/init.d/postgresql-8.4 restart
  9. Create a wrapper function in PostgreSQL:
    CREATE OR REPLACE FUNCTION climate.plr_corr_kendall(
    double precision[],
    double precision[] )
    RETURNS double precision AS
    $BODY$
    Kendall(arg1, arg2)
    $BODY$
    LANGUAGE 'plr' VOLATILE STRICT;
  10. Create a function that uses the wrapper function.
  11. Test the new function.

Wrapper Function

This function performs the work of gathering data from the database and creating two arrays. These arrays are passed into the plr_corr_kendall wrapper function.

CREATE OR REPLACE FUNCTION climate.analysis_vector()
RETURNS double precision AS
$BODY$
DECLARE
  v_year_taken double precision[];
  v_amount double precision[];
  i RECORD;
BEGIN
  FOR i IN (
  SELECT
    extract(YEAR FROM m.taken) AS year_taken,
    avg( m.amount ) AS amount
  FROM
    climate.city c,
    climate.station s,
    climate.station_category sc,
    climate.measurement m
  WHERE 
    c.id = 5148 AND 
    earth_distance( 
      ll_to_earth(c.latitude_decimal,c.longitude_decimal), 
      ll_to_earth(s.latitude_decimal,s.longitude_decimal)) <= 30 AND 
    s.elevation BETWEEN 0  AND  3000  AND 
    s.applicable AND 
    sc.station_id = s.id AND 
    sc.category_id = 1 AND 
    extract(YEAR FROM sc.taken_start) >= 1900 AND 
    extract(YEAR FROM sc.taken_end) <= 2009 AND 
    m.station_id = s.id AND 
    m.taken BETWEEN sc.taken_start AND sc.taken_end AND 
    m.category_id = sc.category_id 
  GROUP BY 
    extract(YEAR FROM m.taken)
  ORDER BY
    extract(YEAR FROM m.taken)
  ) LOOP
    SELECT array_append( v_year_taken, i.year_taken ) INTO v_year_taken;
    SELECT array_append( v_amount, i.amount::double precision ) INTO v_amount;
  END LOOP;

  RAISE NOTICE '%', v_year_taken;
  RAISE NOTICE '%', v_amount;

  RETURN climate.plr_corr_kendall( v_year_taken, v_amount );
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;

Test

Test the function as follows:

SELECT
  *
FROM
  climate.analysis_vector();

Result

A number: -0.0578900910913944

Dave Jarvis