tags:

views:

68

answers:

1

Hello all,

Is there an easy way to remove HTML tags from a character string in R?

Currently I'm extracting out survey data from an XML document and for the title of the question have HTML from the survey design in it, like this.

"Why did you give this performance question a low score?<br />"

Any way to easily remove the <br />?

Any help would be appreciated.

+1  A: 

Take a look at ?gsub and ?regex. Here's some simple code to remove the <br />, but it won't work for all potential HTML tags.

> string <- "Why did you give this performance question a low score?<br />"
> gsub("<.*/>","",string)
[1] "Why did you give this performance question a low score?"
Joshua Ulrich
Thanks for that, it got me on my way to finding 'gsub("<(.|\n)*?>","",string)'
Cam B