tags:

views:

154

answers:

4

I'm trying to find a way to convert multiple lines of text into a data frame. I'm not sure if there's a way where you can use read.delim() to read in multiple lines of text and create the following data frame with something akin to rehape()?.

The data is structured as follows:

A: 1
B: 2
C: 10
A: 34
B: 20
C: 6.7
A: 2
B: 78
C: 35

I'd like to convert this data to something that looks like the following data frame:

A             B             C
1             2             10
34            20            6.7
2             78            35

Apologies if there is an obvious way to do this!

A: 

Here is one solution using reshape

s<-"A: 1
B: 2
C: 10
A: 34
B: 20
C: 6.7
A: 2
B: 78
C: 35
"
d<-d<-read.delim(textConnection(s),header=FALSE,sep=":",strip.white=TRUE)
N<-nrow(d)%/%3
d$id<-rep(1:N,each=3)
reshape(d,dir="wide",timevar="V1",idvar="id")

Which produces

  id V2.A V2.B V2.C
1  1    1    2 10.0
4  2   34   20  6.7
7  3    2   78 35.0
Jyotirmoy Bhattacharya
+3  A: 

Here is how to do it with the plyr package:

require("plyr")
my.data <- "A: 1
            B: 2
            C: 10
            A: 34
            B: 20
            C: 6.7
            A: 2
            B: 78
            C: 35"   
df <- read.delim(textConnection(my.data),header=FALSE,sep=":",strip.white=TRUE)

as.data.frame(dlply(df,.(V1),function(x) x[[2]]))

You get

   A  B    C
1  1  2 10.0
2 34 20  6.7
3  2 78 35.0

You can see what magic plyr is doing just by playing with dlply(df,.(V1)) or dlply(df,.(V1),function(x) x)

Leo Alekseyev
Thanks for the `plyr` suggestion. Definitely worth exploring further. I found an alternative to solving my question using `unstack`
andrewj
Ah, good call; in this case that's probably the way to go. plyr can be rather handy, though, for other "group by" type operations. If you'd like to explore further you might want to read http://had.co.nz/plyr/plyr-intro-090510.pdf
Leo Alekseyev
+3  A: 

How about :

s<-"A: 1
B: 2
C: 10
A: 34
B: 20
C: 6.7
A: 2
B: 78
C: 35
"
d<-read.delim(textConnection(s),header=FALSE,sep=":",strip.white=TRUE)
cols<-levels(d[,'V1'])
d<-data.frame(sapply(cols,function(x) {d['V2'][d['V1']==x]}, USE.NAMES=TRUE))

which yields:

   A  B    C
1  1  2 10.0
2 34 20  6.7
3  2 78 35.0
unutbu
That was a clever use of `sapply()`. I hadn't thought of using it that way before.
andrewj
Thanks. I'm just starting to learn R, so I had to try using the few tools at my disposal. :) I just noticed your solution using `unstack`. That looks like the best way to me.
unutbu
+2  A: 

I posted this question on R-help as well, and got a response from Phil Spector suggesting unstack.

This is a modification of Leo Alekseyev's response

my.data <- "A: 1
            B: 2
            C: 10
            A: 34
            B: 20
            C: 6.7
            A: 2
            B: 78
            C: 35"   
df <- read.delim(textConnection(my.data),header=FALSE,sep=":",strip.white=TRUE)
unstack(df, V2 ~ V1)

This results in:

   A  B    C
1  1  2 10.0
2 34 20  6.7
3  2 78 35.0

Some advantages of this approach compared to the other thoughtful answers is that you don't need to specify the number of columns ahead of time. It also doesn't require any additional packages.

andrewj