views:

255

answers:

6

Hello,

In your experience, what is the best web programming language used to handle sorting and comparison of very large lists (ie tens of thousands of email addresses)?

I am most familiar with PHP. I think that it could get the job done, but I'm unsure of other languages and if there might be a bettor suitor.

Thanks!

+1  A: 

Language usually doesn't matter TOO much. Pick the one you are comfortable most with.

The final product is shaped by the builder, not the tools.

samoz
+16  A: 

Is it possible to do the sorting inside of a database? They are designed to do dynamic sorting and comparison. I would suggest you move to a model that lets the DB handle this sort of activity.

If you really really can't use a DB for some reason then you should focus on algorithms over languages. Pick a language based on other criteria (personal familiarity, does it support your other tasks, does it have an active support community, etc etc) and figure out the best algorithm given that language's quirks. For instance, according to some of the discussion in http://stackoverflow.com/questions/309300/defend-php-convince-me-it-isnt-horrible, PHP has relatively poor recursion performance.

But seriously, use a database for this.

slifty
Will use the database, thanks!
behrk2
+2  A: 

This doesn't depend on the programming language , it depends on the logic ,lets say be it indexes or table schemas and caching mechanism.

Srinivas Reddy Thatiparthy
A: 

Your fastest would be a compiled cgi.

Babiker
Upvoting this. Writing efficient algorithms in C would yield the best result. However, it isn't much a language for programming on the web. But for searching and sorting, it is by far the best (assuming proper algorithms). However, using a database with any other real web programming language in front would be much easier and most likely solve the problem sufficiently.
Kibbee
+14  A: 

I would store the emails in a database, and use SQL to perform sorts and searches. That is what databases were designed for, and they will have intelligent solutions that will outperform anything most people could write in code.

Tom Gullen
Also, if you are going to program a solution, most performance issues are going to come down to algorithms you use to search/sort, that will have a far bigger impact on performance than which language you use.
Tom Gullen
Makes sense. I will use the database to perform most of the operations. Thanks!
behrk2
+1  A: 

You can also use a trie which is a prefix tree data structure - for sorting in memory.

Email addresses have restrictive character set (a-z, 0-9, _, . etc.), so the trieNode would only have those characters. This topcoder tutorial on trie is a good starting point if you don't already know about trie.

You have to go through all the strings to construct the trie.

Searching / Comparison takes O(l) time where l is the length of the string you are comparing.

Sorting requires you traversing all the trieNodes of the trie tree using DFS (depth first search) - O(|V| + |E|) time.

hIpPy
Interesting. I will take a look, thanks!
behrk2
REgarding email addresses and character sets, see http://www.regular-expressions.info/email.html
RCIX