Principal rule is - attributes don't have identity at all - they are accessible only as sideways bits attached to a node. It's good to think of them as non-existent until you have a node first. You can also think of them as totally second class citizens in XPath and XSLT world. Every time you use them in selection conditions it's like you switched from a join to a cursor in SQL and every time you use "for" instead of "apply" the same happens as well.
Another way to put it - the only real, efficient "index" you have is the one with all XPaths in a document (.Net actually builds Hashtable of XPaths => constant time match). The reason for "apply" being privileged is that it guarantees pure functional processing - you could run everything matched by apply on separate threads with no synchronization and no memory sharing - you just concat their results.
Third way to look at it, which is a stretch, imagine that your tags are SQL tables and that you have only surrogate PK-s and FK-s - nothing else you can really select except "all from T1 and all related to them from T2". For any decent SQL engine it's like a 0-cost effort to do it - it just reads one good index item by item since the very structure of it is 1-1 with your query. Everything else costs much more.
Once you have tags selected and templates matched and running, then it's cheap to just grab values of attributes - as long as you just transform/render them. Attrib tests at the end of XPath are reasonably cheap as well - again since the final tag/node is selected and now it's just a little filter on top of it.
So, XSLT engine and XPath selection in general have very good reason to totally ignore attributes - perf.