"Ridiculously easy" is a very relative term. "Getting started" is just that. Writing robust extensions in C requires very careful attention to things like reference counting, memory allocation/freeing, and error handling. Cython does much of that for you.
A non-unicode string in Cython is either a Python str object, or it's an array of char, as in C. What Cython-specific documentation do you think that you need?
I recommend that you try Cython for yourself. BUT before you do that, I strongly recommend that you examine your Python code for inefficiencies. Sometimes you can get big speedups ridiculously easily.
For example, compressing runs of space characters ... using
re.sub(' +', ' ', s) # one space in pattern
means that in the presumably not uncommon case where a run has a length of 1, it will be replacing a space with a space. If all the runs have length 1, it will create a new replacement string when it could easily just increment (or not decrement, or whatever) the reference count of the input string and pass that back.
re.sub(' +', ' ', s) # two spaces in pattern
produces exactly the same results and may run faster ... let's see:
All runs length 1: It runs at 3.4 times the speed. Not shown: the longer the input string, the better it gets.
\python26\python -mtimeit -s"s='now is the winter of our discontent'; import re; x = re.compile(' +').sub" "x(' ', s)"
100000 loops, best of 3: 8.26 usec per loop
\python26\python -mtimeit -s"s='now is the winter of our discontent'; import re; x = re.compile(' +').sub" "x(' ', s)"
100000 loops, best of 3: 2.41 usec per loop
With one run having length 2, the speed ratio is 2.5. With all runs having length 2, the speed ratio is 1.2. All things considered, not a bad return on an investment of 1 keystroke.