wikipedia

Extracting data from Wikipedia JSON or XML with PHP

I want to use PHP (possibly with Curl/XPath?) to extract data from Wikipedia pages. What would be the best way to go about this? I'll be using CakePHP for this project, although just need to figure out how to get this working first. ...

YQL Open Data Table for Wikipedia

Has anyone written a YQL open data table for accessing Wikipedia? I've had a hunt around the internet and found mention of people using YQL for extracting various bits of information from Wikipedia pages such as microformats, links or content but I haven't been able to find an open data table that ties it all together. ...

Parsing Wiki XML Dumps ver0.4 just got tough

Hello, I am trying to parse Wikipedia XML Dump using "Parse-MediaWikiDump-1.0.4" along with "Wikiprep.pl" script. I guess this script works fine with ver0.3 Wiki XML Dumps but not with the latest ver0.4 Dumps. I get the following error. Can't locate object method "page" via package "Parse::MediaWikiDump::Pages" at wikiprep.pl line 390. ...

Wikipedia API: list=alllinks confusion

I'm doing a research project for the summer and I've got to use get some data from Wikipedia, store it and then do some analysis on it. I'm using the Wikipedia API to gather the data and I've got that down pretty well. What my questions is in regards to the links-alllinks option in the API doc here After reading the description, both th...

how we can get the wikipedia data through the yahoo API?

how can i get wikipedia data using yahoo query api? ...

how to get top tourist destinations

how can i get nearest top tourist destinations by giving latitude and longitude value using wikipedia api ...

read wikipedia url's content using jquery, cross domain network call.

jQuery.ajax( { url:'http://en.wikipedia.org/wiki/Football', type:'get', dataType:'jsonp', success:function(data){alert(data);}, } i want to read wikipedia page from my domain using jQuery, iam doing as above. as expected wikipedia is sending data as pure html, but when we use $.ajax to get cross doma...

sync 2 MediaWiki weekly?

hello, we have to mediawiki (like wikipedia) installation one is public and one is internal we do the normal work with the internal adding/changing/deleting text articales and pictures we want to sunc and update the external one on weekly bases, what is the best approach? note: we are using 2 windows servers (but willing to change to...

Shunting-yard: missing argument to operator

I'm implementing the shunting-yard algorithm. I'm having trouble detecting when there are missing arguments to operators. The wikipedia entry is very bad on this topic, and their code also crashes for the example below. For instance 3 - (5 + ) is incorrect because the + is missing an argument. Just before the algorithm reaches the ), t...

Getting the infobox section of wikipedia

If I have the url to the page, how would I get obtain the infobox on the right using mediawiki webservices? ...

How can I get the full change history for an article on Wikipedia?

I'd like a way to download the content of every page in the history of a popular article on Wikipedia. In other words I want to get the full contents of every edit for a single article. How would I go about doing this? Is there a simple way to do this using the Wikipedia API. I looked and didn't find anything the popped out as a si...

Where can I find get a dump of raw text on the web?

I am looking to do some text analysis in a program I am writing. I am looking for alternate sources of text in its raw form similar to what is provided in the Wikipedia dumps (download.wikimedia.com). I'd rather not have to go through the trouble of crawling websites, trying to parse the html , extracting text etc.. ...

Wikipedia with Python

Hello, I have this very simple python code to read xml for the wikipedia api: import urllib from xml.dom import minidom usock = urllib.urlopen("http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500") xmldoc=minidom.parse(usock) usock.close() print xmldoc.toxml() But this code returns with ...

Business Intelligence (BI) on Wikipedia data

Intro: I am a BI addict and would like to develop a project to drill-down Wikipedia's data. I would write scripts to extract data from dbpedia (probably beginning by people articles) and load it into a people table. My question is: Has anyone done this before? Even better, is there a community dedicated to this? If it the scripts are so...

Wikimedia page to Text in Python

Hi Everyone, I would like to convert a wikipedia content extracted with API to plain text. Any tip?? ...

Want top frequent words in english

Hi, I want most frequent words in english. Basically, I am processing wikipedia text and am stuck with lot of words even after removing stop words. I tried googling for frequent words, but got the below link. http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists#English I have to manually scrape the data from these link. Is there a...

What is the license of code snippets on Wikipedia?

I'm looking to incolude code from Wikipedia articles in production software, and also quote it in a publication. Will this affect the licensing of either the code or the publication? My suspicion is that Creative Commons Attribution/Share-Alike License may apply (confusingly) to the code, but not to the publication as here the code is ...

PCA: What's wrong with this algorithm?

Can someone please either confirm or correct this Wikipedia algorithm for computing the first principal component? I want a simple implementation of PCA in D, which doesn't have any existing libraries for PCA AFAIK. I've tried implementing this, and it doesn't seem like my results on simple examples match stuff I get from R or Octave. ...

Wikipedia: pages across multiple languages

Hi, I want to use wikipedia dump for my project. The below information is required for my project. For an wikipedia entry, I want to know which other language contain the page? I want an downloadable data in csv or other common format. Is there a way to get this data? Thanks Bala ...

Wikipedia categories

Hi, I want to get a list of all the wikipedia categories. I can find them here : http://en.wikipedia.org/wiki/Special:Categories Is there a way to download all of them in xml/csv format. Thank you Bala ...