views:

111

answers:

2

Hi,

I am doing a project, for which I need to know all the wikipedia article names(I don't need the content). Is there a place where I can download this data.

Thank you
Bala

+8  A: 

Check out this page here on Wikipedia - there is an option to just download an archive with the names of the articles. Here's the actual path to the download page:

  • All Titles (gzipped) - 32+ Mb at the time of posting.

Edit:

You may notice non-English titles appearing in the list (and some profanity - be advised) contained in enwiki-latest-all-titles-in-ns0.gz. This is because by default most people create content on the main English wiki (language code en). If you were to investigate other language dumps you will observe there are different sets of articles.

Reading on the main download page, there are references to being able to use the Wikipedia API to perform some types of querying on Wikipedia, but I'm not sure this will resolve your problem (taxonomy of the pages doesn't seem to provide a simple way to differentiate "English" content vs "content on English wiki").

AJ
This is only the English articles - use the first link if you want to be able to find article titles (and abstracts / content) for other languages.
AJ
Thank you very much @AJ
Algorist
I noticed that titles actually contain other languages. Is there a way to get english language only titles?
Algorist
What you're seeing are only those pages loaded to the English Wikipedia - which may include non-english titles as people dump content on the <code>en</code> site by default. I'll update my answer to add some more detail.
AJ
A: 

I'm not aware of any central list of articles, but if you just need a large number of them rather than a complete list (bearing in mind that any complete list will always be out of date anyway) then you could probably put something together with wget to recursively follow links within wikipedia from the main page and store the URLs you get.

Vicky
If you really wanted to take this type of approach you could page through the indexes like [the alphabetical listing](http://en.wikipedia.org/wiki/Wikipedia:Quick_index)
AJ
Be aware, however, that Wikipedia specifically asks if you *must* take this type of approach (which should not really be necessary) you limit the rate of page accesses to avoid overloading their servers.
AJ