I'd like to experiment with a popular HTML templating language to see if I can solve XSS problems in it. What is a popular, open-source, templating language that I could try to tackle.
By templating language, I mean a language used to generate an output language by combining static content in that output language with dynamic data from another source. E.g. PHP is commonly used as a templating language for HTML/CSS/JS, and XSLT is a templating language for XML.
The ideal template language would be
- Widely used
- Open source
- Not have already solved XSS
- The simpler the syntax the better
The idea is to
- parse each template so that I end up with a tree of chunks of raw HTML, expressions that produce dynamic values that need to be encoded, and conditional (switch/if) and loop constructs.
- walk the tree inferring context. Possible contexts might include (HTML_PCDATA, IN_JS_DBL_QUOTED_STR, etc.) So if I see a chunk of raw HTML,
<a href="
in an HTML PCDATA context, then I move to a context where I am expecting part of a URL. When I reach a branch or loop, follow each branch independently, and join the contexts afterwards. - if the language has templates, try to determine a static call graph so I can clone templates and rewrite calls where a given template is called in multiple contexts.
- wrap the expressions that produce dynamic values with calls into a library I implement that includes functions like
expectHtml(...)
,expectJsValue(...)
that encode the dynamic value appropriately. E.g.expectHtml(...)
converts<
to<
. - provide some convenience functions so that the code that provies data to templates can use RTTI to specify the language of dynamic values to avoid overescaping. So
expectHtml(...)
would not escape a value of typeHtml
since it is assumed to come from a safe source likeknownSafeHtml(...)
orstripBadTags(...)