ansaurus

Question

Regular expression for matching words between <blockquote> & </blockquote>

Answer 1

+10 A:

Use an HTML parser and forget regular expressions. Regex is incapable of correctly handling HTML.

doc = Nokogiri::HTML(your_html)
doc.xpath("//blockquote").remove

From: Strip text from HTML document using Ruby

There are more examples of how to use Nokogiri and XPath, if you look around.

Tomalak 2010-04-19 07:44:31

Answer 2

A:

raw example:

/<blockquote>([^<]*)<\/blockquote>/

oraz 2010-04-19 07:58:11

This fails for `<blockquote>Some <b>bold text</b></blockquote>`. As I said: Regex is *technically incapable* of correctly handling HTML.

Tomalak 2010-04-19 08:02:34

@Tomalak: yes, i see

oraz 2010-04-19 08:18:37

Answer 3

A:

Sample string:

<blockquote>Hello world</blockquote>

type the following regex in rubular <blockquote>(.+?)<\/blockquote>

or for something more generic:

<.*?>(.+?)<\/.*?>

hope it helps!

Paul 2010-04-19 08:02:45

This fails for `<blockquote>Some <blockquote>quoted text</blockquote> within a quote.</blockquote>`.

Tomalak 2010-04-19 12:16:30

if we are just talking ruby:resultarray = htmlstring.split(/<.*?>/). The split() method will disregard the regex match and the text between the matches is kept. FYI: the scan() method will perform the opposite of this. if you're a newb, i suggest to spend some time learning regexs, it's pretty language agnostic and will serve you well.

Paul 2010-04-19 17:28:57

If this comment was for me: No, I'm not a "newb" as far as regular expressions go. ;) And `htmlstring.split(/<.*?>/)` fails for `<b title="HTML is > than RegEx">Don't do HTML with RegEx</b>`.

Tomalak 2010-04-19 18:56:50

ansaurus

tags:

views:

answers:

Regular expression for matching words between <blockquote> & </blockquote>

related questions