views:

100

answers:

2

I found a perl script that manages randomizing the wikipedia articles in Wikipedia here. The code seems to be slightly computer generated. Due to my present interest in MySQL, I thought you could possibly have the links and related data in a database.

I know that MySQL is good in maintaining relations between tables, while it seems you can easily implement things with Perl. I feel it somehow fuzzy to draw a line to their specialties. So:

How can you randomize Wikipedia articles with MySQL and Perl?

+1  A: 
SELECT id FROM articles ORDER BY RAND() LIMIT 1

You could, of course, just link to http://en.wikipedia.org/wiki/Special:Random

ceejayoz
ceejayoz: yes, but I want to determine the pages to be randomized. It could be some subcategory. So I need to store, at least, them to a db.
Masi
Was just about to post the same thing. However, the ORDER BY RAND() query is not going to work here because of the scale of the problem. ORDER BY RAND() first takes ALL items, orders then, and only then applies the LIMIT, so with Wikipedia it'd be balls slow.
Artem Russakovskii
@Artem: Good point!
Masi
If you use Oracle as db, I guess that the problem would not be that severe.
Masi
I heard in SQL Server 2005, the problem is not severe either - it seems like everyone's figured out how to select at random but mysql.
Artem Russakovskii
+2  A: 

If you really want to know how THEY (Wikipedia) do it, have a look at this code directly from Media Wiki:

http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/includes/specials/SpecialRandompage.php

It is open source software after all ;), and that's the beauty of it.

Edit: From having a quick glance at the code, I am pretty sure they're using a field called page_random, set at row creation time. Then, since it's an indexed field, ordering by it with limit 1 is instant (with a given random offset, valid for this application, of course).

This is a very standard way to make random access quick, due to ORDER BY RAND() being extremely slow, as I mentioned in the other answer.

Edit #2: I love how clean and proper OOP Wiki Media's code is. Definitely bookmarking it to show PHP newbies what good PHP code looks like (and to remind myself).

Artem Russakovskii
+1 for your personal perspective.
Masi
I'm personally am underwhelmed with the PHP code, particularly the lack of braces, its not Python, so there's no excuse
George Jempty
Aw c'mon, that's silly. It's a small class split in small, very specific methods, and it's not like they were trying to cut LOC count. I personally use braces everywhere, but I don't mind clear, concise code that skips a brace or two.
Leonardo Herrera