views:

51

answers:

3

Excuse the title of this post, but I can't really think of a more creative title.

I am calling a 3rd party web service where the authors are ordering transaction results from most recent. The total transaction count is greater than 100 000. To make matters more interesting the web service sends down complex objects representing each transaction, so if I ask for all 100 000 at once, a timeout will occur. So calls to this web service needs to be batched to return only 1000 records at once. This means 100 individual calls to this web service.

So far all is good, except the transactions need to be processed from oldest to newest, so I need a place to temporarily hold JUST the IDs of these transactions, so that later I can recall the IDs in the correct order (oldest to newest) after I have sorted them.

What I am missing in this solution is an RDBMS, I am thinking of using a text file to store the values.

Excuse the long intro, if you're still awake here are the considerations:

(1)

  1. If I just store the values in a text file, I'll end up with over 100 000 lines in the text file in the wrong order, meaning I have to implement a way to read the file from bottom to top
  2. I am not sure, but there might be append to beginning of an existing text file without any performance penalties, in this way once the file is created, I could use built in .net to read the file from top -> down.
  3. I could hook up a text odbc driver and perhaps use some SQL order by clause, but I've never done this before, and I don't want to add any more deployment steps to my app.
  4. Perhaps using a text file is not the way to go, maybe there is a better solution out there for this problem I am not aware of.

This is an architecture / logistics question, any assistance would be appreciated, thanks

+2  A: 

If they're just IDs, do you definitely need to use a file in the first place?

Suppose they're 32-byte IDs... 100,000 of them is still only just over 3MB. Are you really that pushed for memory?

I would definitely try for an in-memory solution to start with - make sure it's going to be okay in the worst conceivable case (e.g. double your expected volume) but then go for it.

The basic moral is not to be too scared of numbers which sound big: 100,000 items may be a lot in human terms, but unless there's quite a lot of data per item, it's peanuts for a modern computer.

Jon Skeet
Yay! First time Jon Skeet said essentially the same thing I did, even if he did beat me by one minute :-)
Eric J.
Thanks Jon, that's very interesting. Maybe I am getting worried over nothing here.
JL
JL, test the worst case scenario and see how much it's straining the system. My guess is that it won't be too bad.
Giovanni Galbo
+4  A: 

If you're running on a typical PC/Server class machine, memory to store 100,000 ID's and associated timestamps is not considered large volume. Consider using an in-memory sorted list.

If you really want to write to a file, you could use File.ReadAllLines and iterate through the resulting string array backwards.

Eric J.
Accepted, because you beat Jon to the draw :)
JL
A: 

You might try storing the information in a DataSet / DataTable combination, and using a DataView attached to the DataSet to change the sort order when you get your data out of it.

Depending on the structure of the XML you are getting back from the Web service, you might be able to read it directly into the DataSet and let it parse it into the DataTables for you (if that works, I'd go for it for the simplicity factor).

This method would involve the least code - but you would have to evaluate the performance of the DataSet with the 100,000 items in it.

I should note that I'm suggesting you store the entire transaction this way (including the ID) then you will have all the data you need to process, and you can loop through it in any sorted order you specify.

I get the impression that you were originally going to just store the IDs, sort them - then re-query the Web service for each id in your sorted order but that would mean hitting the service twice for the same data. I'd avoid that if possible.

Ron Savage