tags:

views:

254

answers:

4

Where each datapoint has a pairing of A and B and there multiple entries in A and multiple entires in B. IE multiple syndromes and multiple diagnoses, although for each datapoint there is one single syndrome-diagnoses pair.

Examples, suggestions, or ideas much appreciated

here's what the data is like. And I want to see connections between values of A and B (how many GG's are linked to TTs etc). Both are nominal datatypes.

ID,A ,B 
1,GG,TT
2,AA,SS
3,BB,XX
4,DD,SS
5,DD,TT
6,CC,XX
7,HH,ZZ
8,AA,TT
9,CC,RR
10,DD,ZZ
11,AA,XX
12,AA,TT
13,DD,SS
14,DD,XX
15,AA,YY
16,CC,ZZ
17,FF,SS
18,FF,XX
19,BB,VV
20,GG,VV
21,GG,SS
22,AA,RR
23,AA,TT
24,AA,SS
25,CC,VV
26,CC,TT
27,FF,RR
28,GG,UU
29,CC,TT
30,BB,ZZ
31,II,TT
32,FF,RR
33,BB,SS
34,GG,YY
35,FF,RR
36,BB,VV
37,II,RR
38,CC,YY
39,FF,VV
40,AA,XX
41,AA,ZZ
42,GG,VV
43,BB,UU
44,II,UU
45,II,SS
46,DD,SS
47,AA,UU
48,BB,VV
49,GG,TT
50,BB,TT
+4  A: 

This is what I do. A darker colour indicates a more important combination of A and B.

dataset <- data.frame(A = sample(LETTERS[1:5], 200, prob = runif(5), replace = TRUE), B = sample(LETTERS[1:5], 200, prob = runif(5), replace = TRUE))
Counts <- as.data.frame(with(dataset, table(A, B)))
library(ggplot2)
ggplot(Counts, aes(x = A, y = B, fill = Freq)) + geom_tile() + scale_fill_gradient(low = "white", high = "black")

Or if you prefer lines

library(ggplot2)
dataset <- data.frame(A = sample(letters[1:5], 200, prob = runif(5), replace = TRUE), B = sample(letters[1:5], 200, prob = runif(5), replace = TRUE))
Counts <- as.data.frame(with(dataset, table(A, B)))
Counts$X <- 0
Counts$Xend <- 1
Counts$Y <- as.numeric(Counts$A)
Counts$Yend <- as.numeric(Counts$B)
ggplot(Counts, aes(x = X, xend = Xend, y = Y, yend = Yend, size = Freq)) +
geom_segment() + scale_x_continuous(breaks = 0:1, labels = c("A", "B")) + 
scale_y_continuous(breaks = 1:5, labels = letters[1:5])

This third options add labels to the data points using geom_text().

library(ggplot2)
dataset <- data.frame(
    A = sample(letters[1:5], 200, prob = runif(5), replace = TRUE), 
    B = sample(LETTERS[20:26], 200, prob = runif(7), replace = TRUE)
)
Counts <- as.data.frame(with(dataset, table(A, B)))
Counts$X <- 0
Counts$Xend <- 1
Counts$Y <- as.numeric(Counts$A)
Counts$Yend <- as.numeric(Counts$B)
ggplot(Counts, aes(x = X, xend = Xend, y = Y, yend = Yend)) + 
geom_segment(aes(size = Freq)) + 
scale_x_continuous(breaks = 0:1, labels = c("A", "B")) + 
scale_y_continuous(breaks = -1) + 
geom_text(aes(x = X, y = Y, label = A), colour = "red", size = 7, hjust = 1, vjust = 1) + 
geom_text(aes(x = Xend, y = Yend, label = B), colour = "red", size = 7, hjust = 0, vjust = 0)
Thierry
Thanks Thierry, I would like some way to see the connections between each pair. So i can see what items in group A lead to items in group B. I am going to attach a dataset example.
Here's the png for Thierry's second solution. http://i38.tinypic.com/2d0juw6.png
Christopher DuBois
+5  A: 
Jonathan Chang
Here's what you should get when you apply Jonathan's technique to the example dataset: http://i33.tinypic.com/5ey26o.png
Christopher DuBois
Nice. Looking at the output, I think it might be nice to color code on one of the factors.
Jonathan Chang
+2  A: 
Marek
+1  A: 

Thanks! I think that the connectivity between elements in each class is best visualized by the link graph examples given by both Jonathon and Thierry. Thierry's 2nd which shows the magnitude is definitely where i will start.

update thanks everyone for you ideas and tips!

I came acrossthe bipartite package that has functions to visualize this kind of data. I think its a clean visualization of the relationships I am trying to show.

did:

    library(bipartite)
    dataset <- data.frame(
         A = sample(letters[1:5], 200, prob = runif(5), replace = TRUE), 
         B = sample(LETTERS[20:26], 200, prob = runif(7), replace = TRUE)
     )
    datamat <- as.matrix(table(dataset$A, dataset$B))
    visweb(datamat, text = "interaction", textsize = .8)

giving: visweb result

couldnt put image in as a new user :(

colinr23
although, I now realize I need different y axes, which I am finding out, is impossible in ggplot
colinr23
you could use geom_text() to put the labels
Thierry