H guys, I've got this wiki formatting algorithm which I am using at Stacked to create HTML out of "wiki syntax" and I am not really sure if the current one I am using is good enough, optimal or contains bugs since I am not really a "Regex Guru". Here is what I am currently using;
// Body is wiki content...
string tmp = Body.Replace("&", "&").Replace("<", "<").Replace(">", ">");
// Sanitizing carriage returns...
tmp = tmp.Replace("\\r\\n", "\\n");
// Replacing dummy links...
tmp = Regex.Replace(
" " + tmp,
"(?<spaceChar>\\s+)(?<linkType>http://|https://)(?<link>\\S+)",
"${spaceChar}<a href=\"${linkType}${link}\"" + nofollow + ">${link}</a>",
RegexOptions.Compiled).Trim();
// Replacing wiki links
tmp = Regex.Replace(tmp,
"(?<begin>\\[{1})(?<linkType>http://|https://)(?<link>\\S+)\\s+(?<content>[^\\]]+)(?<end>[\\]]{1})",
"<a href=\"${linkType}${link}\"" + nofollow + ">${content}</a>",
RegexOptions.Compiled);
// Replacing bolds
tmp = Regex.Replace(tmp,
"(?<begin>\\*{1})(?<content>.+?)(?<end>\\*{1})",
"<strong>${content}</strong>",
RegexOptions.Compiled);
// Replacing italics
tmp = Regex.Replace(tmp,
"(?<begin>_{1})(?<content>.+?)(?<end>_{1})",
"<em>${content}</em>",
RegexOptions.Compiled);
// Replacing lists
tmp = Regex.Replace(tmp,
"(?<begin>\\*{1}[ ]{1})(?<content>.+)(?<end>[^*])",
"<li>${content}</li>",
RegexOptions.Compiled);
tmp = Regex.Replace(tmp,
"(?<content>\\<li\\>{1}.+\\<\\/li\\>)",
"<ul>${content}</ul>",
RegexOptions.Compiled);
// Quoting
tmp = Regex.Replace(tmp,
"(?<content>^>.+$)",
"<blockquote>${content}</blockquote>",
RegexOptions.Compiled | RegexOptions.Multiline).Replace("</blockquote>\n<blockquote>", "\n");
// Paragraphs
tmp = Regex.Replace(tmp,
"(?<content>)\\n{2}",
"${content}</p><p>",
RegexOptions.Compiled);
// Breaks
tmp = Regex.Replace(tmp,
"(?<content>)\\n{1}",
"${content}<br />",
RegexOptions.Compiled);
// Code
tmp = Regex.Replace(tmp,
"(?<begin>\\[code\\])(?<content>[^$]+)(?<end>\\[/code\\])",
"<pre class=\"code\">${content}</pre>",
RegexOptions.Compiled);
// Now hopefully tmp will contain perfect HTML
For those who thinks it's difficult to see the code here, you can also check it out here...
Here is the complete "wiki syntax";
Syntax here:
Link; [http://x.com text]
*bold* (asterisk on both sides)
_italic_ (underscores on both sides)
* Listitem 1
* Listitem 2
* Listitem 3
(the above is asterixes but so.com also creates lists from it)
2 x Carriage Return is opening a new paragraph
1 x Carriage Return is break (br)
[code]
if( YouDoThis )
YouCanWriteCode();
[/code]
> quote (less then operator)
If there are some "Regex gurus" who would like to review this Regex logic I'd appreciate it a lot :)