tags:

views:

144

answers:

6

I'm using VB.Net, and I have a set of data which I have to able to filter through fairly quickly. Basically, the program is like google sugest, but instead of a drop-down menu, I'm using a listbox. When a user enters a word, I compare the word using LINQ and filter those that contain the user's input. The data are all strings of variable length (from 0 to 200 characters, most on 150 character mark), and I have 240,000+ of this strings and counting- all stored in an XML file.

A colleague of mine told me that loading all of that to memory (using VB.Net's XML serializer plus collections of string/objects) is not practical, and would slow the 'startup' time of the program. I haven't finished building the program yet and I'm having second thoughts about continuing this path.

So, my question is: Should I continue with my current approach on the problem (which is load everything to memory on startup), or is there a better way of solving my dilemma?

+4  A: 

If you want to prevent startup time and keeping it in memory isn't an issue on performance, then load it asynchronously. Although loading 240.000+ strings from an XML and keeping it in memory doesn't sound like the greatest idea. Probably a database would be the better approach. Or at least some format like JSON that's faster to parse.

Mircea Grelus
+1 for using a database. Also, this is a great candidate for the `BackgroundWorker` component in .NET -- it really should be done async.
Daniel Pryden
Doing this sort of thing async may be overrated. If there's nothing the user can do with the application *until* the whole data set is loaded and searchable, loading it asynchronously is pointless. The database is a good idea, though.
MusiGenesis
@MusiGenesis Well if that's the scenario of course. But if there is a page in between, or certain other actions are required (like filling other fields before getting to the specific one), then you'll spare the user from looking at a starting screen waiting for the app to load.
Mircea Grelus
Enter the notorious splash screen :)
RobS
The OP describes a listbox that suggests completed strings (e.g. autocomplete). It seems perfectly reasonable that the user could begin typing a string while the suggestions were still loading asynchronously. On the other hand, making the user wait for all the suggestions to be loaded before they can type the first letter seems silly. So it seems like a very good case for asynchronous loading to me.
Daniel Pryden
A: 

Depends on a number of things:

If 
((you know the strings will not hugely increase in number) && 
(you know the spec of the machines that will run your app) && 
(you are able to test that the load time is *good enough* on the above spec))
{
**don't bother changing approach.** 
}
else
{
**change approach.**
}

The alternative approach is obviously some kind of asynch lazy-load.

JohnIdol
-1 (virtual) for your Snake Plissken avatar.
MusiGenesis
I take that you're not a fan :)
JohnIdol
A: 

You're talking about loading roughly 36MB of strings. While this isn't a daunting amount by any means (though you could probably load it faster reading the XML yourself...I wouldn't go with the serialization engine if I was worried about performance), it's also a non-trivial amount. You're looking a adding a couple of seconds to your startup time, assuming you don't do it asynchronously as Mircea suggests.

If you do do it asynchronously, you'll have to ensure that any UI process that relies on the data doesn't occur until after it has loaded. That may be a difficult thing to ensure.

Adam Robinson
A: 

It may not be a bad idea to load the XML into memory when the app starts up. But if you go this route I'd look into using the BackgroundWorker thread. The idea would be to load the XML into memory asynchronously so the UI is still responsive as this is going on. As far as the user is concerned the app shouldn't appear to start any slower, and yet once done the Google-suggest-like feature should be significantly faster.

I must say that even in memory this is an inherently inefficient operation since you have no advantage of using an index when querying an XML file in this way. This is something that would be 10X faster in SQL with full-text searching.

Of course XML has the advantage of being self-contained and requiring no additional components. And that makes it a decent choice for small desktop apps that query small amounts of data. Otherwise I would consider using a database for better performance.

Steve Wortham
A: 

The question seems to imply an online application. A few suggestions if that is the case:

  • The data could / should be zipped. I suspect it would compress very nicely.
  • Maybe the data could be cached accross multiple sessions, possibly be delivered as html content with a expiry cache date as appropriate. This would save systematic loading, and may be feasible if the data isn't updated frequently.
  • The suggestion feature feature could be initially disabled (i.e. say showing a "loading..." message while the application initializes the cache, asynchronously). In this fashion the application would be quickly available upon startup, even though the suggest feature may lag by up to say 30 seconds or so.

Edit: Independently of how the data gets downloaded and cached, I second the opinion of Mircea Grelus that an xml file of this size is a poor substitute for a database.

mjv
A: 

You might be better served by using binary serialization rather than XML serialization to persist the data that your app reads on startup, particularly if you end up implementing a data structure that's faster to search than a `StringCollection. You'd still maintain the XML version of the data somewhere, of course.

And by all means, use a BackgroundWorker to load the data asynchronously if that'll make your application feel more responsive.

Robert Rossney