The twitter API takes a page number parameter. In the atom results, there are link elements, with rel attributes for next and previous. This will be your best indicator as to whether you should go looking for a 2nd page and so on. The href attribute of that tag will even tell you the URL you should request.
The query you create also takes a since_id parameter. You'll want to store the largest id number you see in your responses and use it in subsequent requests so that you don't have to filter duplicates.
As for data storage, your selection is probably best guided by what you plan to do with the results... if you're going to be doing any querying, you should probably file it away in a database, i.e. MySQL. If you're just logging, flat file should do you fine.