ansaurus

Question

Is adding nowiki-tags to this parser feasible?

Answer 1

+3 A:

If you make use of a simple tokenizer, it's much easier to manage this sort of thing. One approach is to create a single regular expression that can capture your entire grammar, but this might prove to be problematic. An alternative is to split up the document into sections that need to be rewritten, and sections that should be skipped, which is likely the easier approach here.

Here's a simple framework you can extend as required:

def wiki_subst(string)
  buffer = string.dup
  result = ''

  while (m = buffer.match(/<\s*nowiki\s*>.*?<\s*\/\s*nowiki\s*>/i))
    result << yield(m.pre_match)
    result << m.to_s
    buffer = m.post_match
  end

  result << yield(buffer)

  result
end

example = "replace me<nowiki>but not me</nowiki>replace me too<NOWIKI>but not me either</nowiki>and me"

puts wiki_subst(example) { |s| s.upcase }
# => REPLACE ME<nowiki>but not me</nowiki>REPLACE ME TOO<NOWIKI>but not me either</nowiki>AND ME

tadman 2009-09-16 21:55:23

Is the splitting of the text into paragraphs, like my parser does, a form of tokenizer?

August Lilleaas 2009-09-17 05:33:26

Using a very loose definition, perhaps. Generally a tokenizer splits up an input stream into different components that can be operated on individually with the finest level of granularity required. Splitting into paragraphs, and then later splitting into other parts is a kind of two-pass tokenizer. Generally when writing this sort of thing you can only get so far with a roll-your-own approach to parsing. At some point it's more efficient to go with a proper parser framework, but that's another subject.

tadman 2009-09-17 15:12:44

Tagged as answer. Thanks!

August Lilleaas 2009-09-21 08:23:53

ansaurus

tags:

views:

answers:

Is adding nowiki-tags to this parser feasible?

related questions