views:

133

answers:

4

I am working with a large list of points (each point has three dimensions x,y,z).

I am pretty new with R, so I would like to know what is the best way to represent that kind of information. As far as I know, an array allows me to represent any multidimensional data, so currently I am using:

> points<-array( c(1,2,0,1,3,0,2,4,0,2,5,0,2,7,0,3,8,0), dim=c(3,6) )
> points
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    2    2    3  -- x dim
[2,]    2    3    4    5    7    8  -- y dim
[3,]    0    0    0    0    0    0  -- z dim

The aim is to perform some computations to calculate the euclidean distance between two sets of points such as:

points1<-array( c(1,2,0,1,3,0,2,4,0,2,5,0,2,7,0,3,8,0), dim=c(3,6) )
points2<-array( c(2,2,0,1,4,0,2,3,0,2,4,0,2,6,0,2,8,0), dim=c(3,6) )

(any hint in this sense would also be highly appreciated)

+2  A: 

You probably want to see what the CRAN Task View for Statial Data Analysis has to offer -- there are a number of suitable packages.

Dirk Eddelbuettel
Thank you, your link is broken. Do you mean http://cran.r-project.org/web/views/Spatial.html ?
Guido
Dirk Eddelbuettel
+4  A: 

You can get the distance matrix using the function dist. This function computes the distances between the rows of a data matrix, so I transposed your points array

dist(t(points),method = "euclidean")

Another similar function to compute the distance matrix is Dist from package amap, which provides even more distance measures : ("euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman", "kendall")

gd047
Thank you. The distance is not between the points in an array, but the distance between the two arrays as a whole. I plan to calculate it as the sum of distances between each pair of points, sum(dist(point_i_in_array_1, point_i_in_array_2))
Guido
In that case consider also *intercluster* metrics. Take a look here http://finzi.psych.upenn.edu/R/library/clv/html/cluster_scatter.html for metrics definitions and functions that calculate them.
gd047
+1  A: 

I'd suggest working with your matrix transposed, or you'll probably end up calling the function t() more than you otherwise would.

Aside from that, this is probably the data structure you want. You could do it with a data frame of course, but I think you're better off not doing so in this situation.

Glen_b
+3  A: 

Calculating the Euclidean distance between two sets of points stored like this is easy:

sqrt(colSums((points1 - points2)^2))

Although I'd second the recommendation to store dimensions in the columns. In that case the code becomes:

sqrt(rowSums((points1 - points2)^2))
hadley
i like column-orientation
Dan