Well, even if others are trying to discourage you from doing this, it would probably not be that hard.
On the client-side, you, you define a div that is floated/resizable over the image, with transparency, that can be scaled for the crop.
Move, I assume it applies only to the text, so you dynamically create draggable spans on the client side, still easy.
Scale, I have no Idea of a simple UI to do it.
When you want to update your Image, you serialize your data (position of your cropping div and position of your text spans / scaling, relative to the position to the image.) Then, using json or anything similar you'd like, you transfer the data to the server.
Then, on the server, using python/PIL, you reproduce the transformations that you have serialized.