tags:

views:

434

answers:

2

I have been using surveygizmo which is an amazingly powerful online questionnaire presenter. The data can be exported as a csv file but alas it has two [not one] header rows. The first row specifies the question and the second row contains possible responses that the respondent could have checked off. This seems highly aberant in the data-read-and-write world but seems quite normal in the survey world. How does one read such a file into R?

Surveygizmo used to have an "old" export format put everything into one row but I have had problems lately that the website will not export it. Surveygizmo are not really interested in the "old" format since it is 2 generations ago and they do not want to support it.

In a simple survey an intern who was helping me was able to overcome the problem with the following code

#Read csv file with two rows of headers
#Append the second row to the first row
df <-read.csv(csvfile,skip=1,stringsAsFactors=FALSE) #Read csv without any header
hl=readLines(csvfile, 2)            #Read the two header lines as char strings
hl=strsplit(hl,',')                   #Split headers up by commas
colnames(df)=sub('_$','',paste(hl[[1]],hl[[2]],sep=""))  #join second row to first row  

However, with a longer survey with more questions and with the questions being longer (and thus the headers being longer) our brute force method above is not working.

At the end I want a data frame with column headings which I will then merge with another data frame coming from a follow up survey. Any online references that deal with this issue?

Here is an example of the csv file with two header rows. The 3rd and final row is the first line of data. I have changed everything that relates to private health information. The headers are very very long since survey gizmo used the whole question as the heading.

"","","","","","","","","","Inclusion Criteria I or my child is a patient with recurrent respiratory papillomatosis (RRP)How do you know that you or your child has RRP? Please check whatever is true.","","","Exclusion Criteria Do any of the following apply? Please put a check next to any condition that is present.In the unlikely event that one of the following conditions apply, then unfortunately we cannot enroll you in this study. You could stop or you could carry on telling us about yourself, whichever you prefer. ","","Confused or have questions?If you are confused about any items or if you want us to clarify something then here is the place that you can express yourself freely. Also, you can call us at (412) 567-7870 or at (888) 887-7729.You are encouraged to review the consent form. You do not have to sign it now but you will need to do so once we enroll you. ","Please tell us who you are - referring to you, the person completing the form. Different people feel differently about their privacy and about how they are contacted. We will do our utmost to protect your privacy. Please do not give us your e-mail address if you do not want us to use it. Remember that e-mail should be private but is not always so. The safest way to think about it is as if e-mail was similar to a post card. Please do not give us a telephone number you do not want us to contact you on.","","","","","","","","","","","Who are you? Are you the patient or a parent or someone else?","When was the person with RRP born?Enter the date as MM/DD/YYYY","Approximately when was RRP diagnosed? This can be very approximate. If you do not remember the date then please put down your best guess. We will use it to work out how old the patient was when he or she was diagnosed. Enter the date as MM/DD/YYYY.","Has the patient with RRP ever received Gardasil? Gardasil is a vaccine against HPV 6, 11, 16 and 18 that was approved by the Food and Drug Administration (FDA) for use in females to prevent gynecologic diseases. ","Please ignore this question. It is for our internal tracking. Are you?","gender","race","Has there been human contact? By e-mail or by telephone or by anything in which we discussed informed consent","What is the subject number?","Merck Research Laboratory Accession Number?","Second Merck  Accession Number?","FedEx Tracking Number","Date Shipped Out","Date EMSI Notified"
"Response ID","RespondantKey","Edit Link","IP","Date Started","Date Finished","Status","Linked From","Comments","histopathconfirm","surgeonseaid","other","cancer","none","","First Name","Last Name","Street Address","Apt/Suite/Office","City","State","Postal Code","Country","Email Address","Phone Number","Mobile Phone","","","","","","","","","","","","","",""
"6990181","4099941","http://s-gtzd7-14166.sgizmo.com/?edit=6770181&amp;cc=e246ecb7095b983xxxxx7ec0a9","1991.157.178.134","2009-04-30 07:57:24","2009-04-15 14:56:01","Submitted","","Spoke to her Thursday, 20 Apr 2009 20:26. No questions ready to go.09/11/2009 consent mailed..mrs accession number 304074333811wp, 01wp SFJB06123 Fedex tracking 865888887357 sent Tues April 29; called her Thurs, 10 May 2009 20:21 she will sign slip","histopathconfirm","surgeonseaid","","","none","","Jane","Doe","23 Hastings Rd","29th floor","Oranje","ny","27935","USA","[email protected]","728-850-7252","626-922-2239","Patient","02/21/1965","01/01/1976","No","Key Person","","","Yes","SFJB06123","304033385811wp","303334485801wp","865333807357","4/11/2007","4/11/2007"
+1  A: 

Why not just have "read.csv" read in the first header row (which are the actual headers as i understand your question) then skip the next row:

read.csv(file, header=T, skip=1)

Alternatively, if that second header line begins with an idiosyncratic character (not found in your data) then you can specify that is a comment line (if that line began w/ "#" for instance, it would be)::

read.csv(file, header=T, comment.char="#")
doug
One cannot skip the second row. The second row is sometimes the header. For several columns the header is in the second row. See the data that I edited into my original question.
Farrel
A: 

Hi Farrel,

Actually I think the easiest way to do this is use the SurveyGizmo SPSS export instead. Export your data to SPSS and then use a similar command in R to:

read.spss(file='mydata.sav')

That should actually work perfectly for you and bring all data description into R as well.

I'm sorry about the confusing phone call you had yesterday. It's true, we are trying hard not to go back and fix the old, old, old CSV export. However, it's not that we don't want to support single header exports. In version 3, the "Quick Export" that's active right now is a single column export and will remain that way -- officially supported.

Sadly the older version of this export is too far removed and unoptimized to be able to upgrade it to work with some of the more modern browsers. Especially as the new export is only weeks away from taking over. I'm sorry if the phone call was in any way unclear or unprofessional.

In this case, the SPSS Export is the way to go! If you don't have a plan level that gives you that export option, just email (or call) support and point them to this post. They'll add the export to your account.

Cheers,

-Christian Vanek
CTO & Co-Founder
SurveyGizmo

christian
I tried using the package foreign to access the command read.spssBut alas I hit errors> initialspss <-read.spss(file="20100222180259-InitialScreen.sav")Warning message:In read.spss(file = "20100222180259-InitialScreen.sav") : 20100222180259-InitialScreen.sav: Unrecognized record type 7, subtype 14 encountered in system fileI tried going to version 3. Am I correct in deducing that I can copy a survey to version 3 but its originally entered data stays in version 2 where there is no quick export. Please disavow me of any misconceptions.
Farrel