views:

68

answers:

2

Hi,

Below are the first five rows of the imported data in R:

data[1:5,]

    user event_date day_of_week
1 00002781A2ADA816CDB0D138146BD63323CCDAB2 2010-09-04    Saturday
2 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-04    Saturday
3 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-07     Tuesday
4 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-08   Wednesday
5 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-17      Friday
  distinct_events_a_count total_events_a_count
1                             2                          2
2                             2                          2
3                             1                          3
4                             1                          1
5                             1                          1
  events_a_duration distinct_events_b_count total_events_b_count
1                     615                       1                    1
2                      77                       1                    1
3                     201                       1                    1
4                      44                       1                    1
5                       3                       1                    1
  events_b_duration
1                      47
2                      43
3                     117
4                      74
5                      18

The problem is that the columns 6 and 9 are read as factors and not numerics therefore I can't perform math operations. In order to convert the imported data to appropriate format I tried to create the structure dataset the following way:

dataset<-data.frame(events_a_duration=as.numeric(c(data[,6])), events_b_duration=as.numeric(c(data[,9])))

but checking the values I noticed that the frame structure doesn't contain the appropriate values:

 dataset[1,]


events_a_duration events_b_duration
1                   10217                    6184

The values should be 615 and 47.

So what I don't know is how to create the frame data structure that consists of imported data columns and would be very thankful if anyone could show the way to create the appropriate data structure.

+5  A: 

Your problem is that you are converting factors to integers by using the numbers of classes instead of the corresponding values. You can check that classes are numbered in ascending order of the values:

> as.numeric(factor(c(615,47,42)))
[1] 3 2 1
> as.numeric(factor(c(615,42,47)))
[1] 3 1 2
> as.numeric(factor(c(615,42,47,37)))
[1] 4 2 3 1
> as.numeric(factor(c(615,42,37,47)))
[1] 4 2 1 3

Use as.numeric(as.character(MyFactor)). See below for instance:

> as.numeric(as.character(factor(c(615,42,37,47))))
[1] 615  42  37  47
wok
A: 

data <- read.csv ("data.csv", stringsAsFactors=FALSE)

newuser1