Hello, I need to implement a simple and efficient XSS Filter in C++ for CppCMS. I can't use existing high quality filters written in PHP because because it is high performance framework that uses C++.
The basic idea is provide a filter that have a while list of HTML tags and a white
list of options for these tags. For example. typical HTML input can consist of
<b>
, <i>
, tags and <a>
tag with href
. But straightforward implementation is not
good enough, because, even allowed simple links may include XSS:
<a href="javascript:alert('XSS')">Click On Me</a>
There are many other examples can be found there. So I though also about a possibility to create a white list of prefixes for tags like href/src -- so I always need to check if it starts with (https?|ftp)://
Questions:
- Are these assumptions are good enough for most of purposes? Meaning that If I do not
give an options for
style
tags and check src/href using white list of prefixes it solves XSS problems? Are there problems that can't be fixes this way? - Is there a good reference for formal grammar of HTML/XHTML in order to write simple
parser that would cleanup all incorrect of forbidden tags like
<script>