tags:

views:

108

answers:

1

Hello, I'm new to R and I'm trying to create a .R script that will open up a .csv file of mine and compute some frequencies. There are headers in this file and the values associated with them are either 1,0,NA, or -4. What I want to do is go through each vertical row and then compute the frequencies of them. I'm sure this is an easy script, but I'm not sure how the syntax of R works yet. Can anyone get me started on this please?

A: 

The exact script is going to vary based on your input and what kind of output you'd like (just printed to the interactive console? Written to .csv?), but here's my attempt:

#Read the data into .csv - it assumes headers
dat <- read.csv(file = "yourfile.csv")

#For right now, use this fake data
dat <- data.frame(x = c(-4, 0, 1, 1, -4, NA, NA, 0), y = c(1, 1, 1, 0, -4, NA, 0, NA))

#Get the frequency of values for each column, assuming every column consists of data
apply(X = dat, MARGIN = 2, FUN = function(x) {summary(factor(x))})

The apply function applies the function you give it (FUN) over the margin (1 = rows, 2 = columns) of the data that you give it. You can give it any function you like. Passing FUN = summary will give you the mean, min, max, etc. of each column (because they're numeric). But the default method of summary() for factors is frequencies, which is what you need. So instead of passing summary, trick R into seeing your numbers as a factor: define an anonymous function function(x) (apply will know that by x you're referring to the columns taken one at a time). Set this function to first convert x to a factor (factor(x)) and then summarize that factor. This will return a matrix with the frequencies for each column.

Not the most elegant code ever, but I think it'll get you what you need.

Matt Parker
for prettier code, you could do`apply(dat, 2, table, useNA = "always")`
JoFrhwld
Very nice - I knew there had to be a better way than clobbing it into a factor.
Matt Parker