tags:

views:

73

answers:

2

Hi,

Reading the data the following way

data<-read.csv("userStats.csv", sep=",", header=F)

I tried to select an element at the specific position.

The example of the data (first five rows) is the following (V2 is the date and V3 is the day of week):

   V1               V2
1 00002781A2ADA816CDB0D138146BD63323CCDAB2                 2010-09-04
2 00002D2354C7080C0868CB0E18C46157CA9F0FD4                 2010-09-04
3 00002D2354C7080C0868CB0E18C46157CA9F0FD4                 2010-09-07
4 00002D2354C7080C0868CB0E18C46157CA9F0FD4                 2010-09-08
5 00002D2354C7080C0868CB0E18C46157CA9F0FD4                 2010-09-17
                              V3 V4 V5          V6 V7 V8          V9
1 Saturday                        2  2         615  1  1          47
2 Saturday                        2  2          77  1  1          43
3 Tuesday                         1  3         201  1  1         117
4 Wednesday                       1  1          44  1  1          74
5 Friday                          1  1           3  1  1          18

I tried to divide 6th column with 9th column in the first row the following way:

data[1,6]/data[1,9]

but it returned an error

[1] NA
Warning message:
In Ops.factor(data[1, 6], data[1, 9]) : / not meaningful for factors

Then I tried to select just one element

> data[2,9]
[1]          43
11685 Levels:            0           1           2           3 ...       55311

but don't know what these Levels are and what causes an error. Does anyone know how to select an element at the specific position data[row, column]?

Thank you!

+3  A: 

The standard modeling data structure in R is a data.frame.

The data.frame objects can hold various types: numeric, character, factor, ...

Now, when reading data via read.csv() et al, you can get bitten by the default valus of the stringsAsFactors option. I presume that at least a row in your data had text, so R decides to decode it as a factor and presto! you no longer can do direct mathematical operations on the column.

In short, do summary(data) and/or a sweep of class() over all the columns. Convert as necessary, or turn the stringsAsFactors variable to a different value or both.

Once your data is numeric, you can divide, slice, dice, ... as you please.

Dirk Eddelbuettel
+3  A: 

My favorite tool to check variable class is str().

What you have there is a data frame and at least one of the columns you're trying to work with is a factor. See Dirk's answer on how to change classes of a column.

Command

data[1,6]/data[1,9]

is selecting the value in the first row of sixth column and dividing with the value in first row of the ninth column. Is this what you want? If you want to use values from the entire column (and not just the first row), you would write

data[6] / data[9]

or

data[, 6] / data[, 9]

Both arguments are equivalent for data.frames.

Roman Luštrik