There are key self-contained algorithms - particularly cryptography-related such as AES, RSA, SHA1 etc - which you can find many implementations of for free on the internet.
Some are written to be nice and portable clean C.
Some are written to be fast - often with macros, and explicit unrolling.
As far as I can tell, none are trying to be especially super-small - so I'm resigned to writing my own - explicitly AES128 decryption and SHA1 for ARM THUMB2. (I've verified by compiling all I can find for my target machine with GCC with -Os and -mthumb and such)
What patterns and tricks can I use to do so?
Are there compilers/tools that can roll-up code?