I have input field value from that is used for forming XPath query. What symbols in input string should I check to minimise possibility of XML injection?
I would start with considering what is valid input for your particular use case then, look at ways to restrict everything else. If you have a fixed range of entry values, I would limit entry to just those values. Otherwise, if your use case requires you to take the future into account, then you will probably want to check for axis modifiers and path separators such as :
and \
.
It depends what you mean by 'XML injection'. Are there parts of the document that are sensitive and that the user cannot be allowed to see? Or are you opening it as a writable state and allowing the user to update parts of the document, and they should only be allowed to update certain parts?
At a basic level to answer your question you need to look for xpath axis operations (e.g. //
, /
, ::
) and wildcards (@*
, *
) as a bare minimum. But my feeling is that using user input to build xpath directly may not be the optimal solution. Maybe if you give us more context around what you're trying to achieve we could suggest alternative approaches?
This document describes in detail the concept of "Blind XPath Injection".
It provides concrete examples of XPath injections and discusses ways of preventing such.
In the section "Defending against XPath Injection" it is said:
"Defending against XPath Injection is essentially similar to defending against SQL injection. The application must sanitize user input. Specifically, the single and double quote characters should be disallowed. This can be done either in the application itself, or in a third party product (e.g. application firewall.) Testing application susceptibility to XPath Injection can be easily performed by injecting a single quote or a double quote, and inspecting the response. If an error has occurred, then it’s likely that an XPath Injection is possible."
As others have said, one should also pay attention to using of axes and the // abbreviation. If XPath 2.0 is being used, then the doc
() function should not be allowed, as it gives access to any document with known URI (or filename).
It is advisable to use an API which precompiles an XPath expression but leaves the possibility that it works with dynamically defined parameters or variables. Then the user input will define the contents of these parameters only and will never be treated as a modification of the already compiled expression.
Turn your tactics upside down.
Don't try to filter out unacceptable characters - a policy of "Assume it's OK unless I know it's bad"
Instead, filter in acceptable characters - a policy of "This stuff is OK, I'll assume everything else is bad".
In security terms, adopt a policy of "Default Deny" instead of "Default Accept".
For example ...
... if you're asking someone for a search term, say a persons first name, limit the input to only the characters you expect to find in names.
One way would be to limit to A-Z and then ensure that your search technique is accent aware (eg i = ì = í = î = ï and so on ), though this falls down on non-european naming.
... if you're asking for a number, limit to just digits and reject everything else.emphasized text
Closing this vulnerability is just a hotfix. So applying policy "Default Deny" is too dangerous now. I decided to check input for following symbols [,",',*,=,{,\,.,space. I think this could prevent most common attacks Thank you all for answers!
A validation of the input string will be helpful, maybe, using something like a regular expression (something like this ^\w+) based on that no special chars will be allowed.