What does Canonical Representation mean and its potential vulnerability to websites

Canonicalisation is the process by which you take an input, such as a file name, or a string, and turn it into a standard representation.

For example if your web application only allows access to files under C:\websites\mydomain then typically any input referring to filenames is canonicalised to be a physical, direct path, rather than one which uses relative paths. If you wanted to open C:\websites\mydomain\example\example.txt one input into that function may be example\example.txt. It's hard to work out if this goes outside the boundaries of your web site, so the canonicalisation function would look at the application directory and change that relative path into a physical one, C:\websites\mydomain\example\example.txt. This is obviously easier to check as you simply do a string compare on the start of the file path.

For HTML inputs you take inputs like %20 and canonicalise them by unencoding, so this would turn into a space. This is a good idea as the number of different ways of encoding are numerous, canonicalisation means you would check the decoded string only, rather than try to cover all the encoding variations.

Basically you are taking input which is logically equivalent and converting them to a standard form which you can then act upon.

But how does that apply to websites and being vulnerable?

predhme 2009-07-22 18:59:31

See blowdart's answer for a definition of canonical

Michael Donohue 2009-07-22 19:08:17

My answer doesn't conflict at all with Blowdarts answer except that I could explain it in one sentence. For programmers, efficiency is key.

djangofan 2009-08-01 17:35:52

so potentially in an input field I could try and do an sql injection attack or possible XSS to bypass normal string sanitization?

predhme 2009-07-23 13:09:53

Sanitation is different. Generally a SQL injection attack isn't going to use encoding, so it's not a canonicalisation issue. XSS may be, it depends on what you do. If you're encoding all input before outputing it then no, it's not. However if you're attempting to white list, or worse blacklist certain parts of a string then you would canonicalise the string first because, for example, <script> can also be represented by <script> or <script> and so on

blowdart 2009-07-23 14:24:53

Ahhh I see. Thanks a lot that clears up everything I was looking for. Thanks!

predhme 2009-07-23 14:32:08

ansaurus

tags:

views:

answers:

What does Canonical Representation mean and its potential vulnerability to websites

related questions