tags:

views:

49

answers:

2

My implementation of markdown turns double hyphens into endashes. E.g., a -- b becomes a – b

But sometimes users write a - b when they mean a -- b. I'd like a regular expression to fix this.

Obviously body.gsub(/ - /, " -- ") comes to mind, but this messes up markdown's unordered lists – i.e., if a line starts - list item, it will become -- list item. So solution must only swap out hyphens when there is a word character somewhere to their left

+1  A: 

You can match a word character to the hyphen's left and use a backreference in the replacement string to put it back:

body.gsub(/(\w) - /, '\1 -- ')
yjerem
you also slap a \s? in there to take care of having space or not:(\w)\s?-
Jed Schneider
+1  A: 

Perhaps, if you want to be a little more accepting ...

gsub(/\b([ \t]+)-(?=[ \t]+)/, '\1--')

\b[ \t] forces a non-whitepace before the whitespace through a word boundary condition. I don't use \s to avoid line-runs. I also only use one capture to preserve the preceding whitespace (does Ruby 1.8.x have a ?<= ?).

pst
What advantage does this regex have over @yjerem's (simpler) answer?
Horace Loeb
@Horace It accepts more and preserves input white-space (spaces or tabs). For instance: "Hello\t\t- world!" will be turned into "Hello\t\t-- world!", which may or may not be desirable.
pst