database vs flat file, which is a faster structure for "regex" matching with many simultaneous requests

views:

228

answers:

database vs flat file, which is a faster structure for "regex" matching with many simultaneous requests

Hi, which structure returns faster result and/or less taxing on the host server, flat file or database (mysql)?

Assume many users (100 users) are simultaneously query the file/db. Searches involve pattern matching against a static file/db. File has 50,000 unique lines (same data type). There could be many matches. There is no writing to the file/db, just read.

Is it possible to have a duplicate the file/db and write a logic switch to use the backup file/db if the main file is in use?

Which language is best for the type of structure? Perl for flat and PHP for db?

Addition info:

If I want to find all the cities have the pattern "cis" in their names. Which is better/faster, using regex or string functions?

Please recommend a strategy

TIA

+1 A:

Hi Jamex,

I am a huge fan of simple solutions, and thus prefer -- for simple tasks -- flat file storage. A relational DB with its indexing capabilities won't help you much with arbitrary regex patterns at all, and the filesystem's caching ensures that this rather small file is in memory anyway. I would go the flat file + perl route.

Edit: (taking your new information into account) If it's really just about finding a substring in one known attribute, then using a fulltext index (which a DB provides) will help you a bit (depending on the type of index applied) and might provide an easy and reasonably fast solution that fits your requirements. Of course, you could implement an index yourself on the file system, e.g. using a variation of a Suffix Tree, which is hard to be beaten speed-wise.

Still, I would go the flat file route (and if it fits your purpose, have a look at awk), because if you had started implementing it, you'd be finished already ;) Further I suspect that the amount of users you talk about won't make the system feel the difference (your CPU will be bored most of the time anyway).

If you are uncertain, just try it! Implement that regex+perl solution, it takes a few minutes if you know perl, loop 100 times and measure with time. If it is sufficiently fast, use it, if not, consider another solution. You have to keep in mind that your 50,000 unique lines are really a low number in terms of modern computing. (compare with this: http://stackoverflow.com/questions/546829/optimizing-mysql-table-indexing-for-substring-queries )

HTH,
alexander

Alexander Feder 2010-05-22 09:07:35

A database *add* features to flat files. Flat files are **always** faster than a database for every kind of operation. However, flat files require so much more programming that they're not always the best choice for every problem. But for simple, bulk processing, they're faster.

S.Lott 2010-05-22 11:27:47

thanks for the advice

Jamex 2010-05-22 16:09:15

@S.Lott: Depending on the data and queries involved a database can be much faster due the use of indices, especially for searching.

Florian Diesch 2010-05-22 19:31:53

Depending on how your queries and your data look like a full text search engine like Lucene or Sphinx could be a good idea.

Florian Diesch 2010-05-22 19:11:27

ansaurus

tags:

views:

answers:

database vs flat file, which is a faster structure for "regex" matching with many simultaneous requests

related questions