I suspect your SURF usage may need some alteration?
Here is a link to an MIT paper on using SURF for augmented reality applications on mobile devices.
Excerpt:
In this section, we present our
implementation of the SURF al- gorithm
and its adaptation to the mobile
phone. Next, we discuss the impact
that accuracy has on the speed of the
nearest-neighbor search and show that
we can achieve an order of magnitude
speed- up with minimal impact on
matching accuracy. Finally, we dis-
cuss the details of the phone
implementation of the image matching
pipeline. We study the performance,
memory use, and bandwidth consumption
on the phone.
You might also want to look into OpenCV's algorithms because they are tried and tested.
Depending on the constraints of your application, you may be able to reduce the genericness of those algorithms to look for known POIs and markers within the image.
Part of tracking a POI is estimating its vector from one point in the 2D image to another, and then optionally confirming that it still exists there (through pixel characteristics). The same approach can be used to track (not re-scan the entire image) for POI and POI group/object perspective and rotation changes.
There are tons of papers online for tracking objects on a 2D projection (up to a servere skew in many cases).
Good Luck!