I want to use PHP (possibly with Curl/XPath?) to extract data from Wikipedia pages. What would be the best way to go about this? I'll be using CakePHP for this project, although just need to figure out how to get this working first.
...
Has anyone written a YQL open data table for accessing Wikipedia? I've had a hunt around the internet and found mention of people using YQL for extracting various bits of information from Wikipedia pages such as microformats, links or content but I haven't been able to find an open data table that ties it all together.
...
Hello, I am trying to parse Wikipedia XML Dump using "Parse-MediaWikiDump-1.0.4" along with "Wikiprep.pl" script. I guess this script works fine with ver0.3 Wiki XML Dumps but not with the latest ver0.4 Dumps. I get the following error.
Can't locate object method "page" via package "Parse::MediaWikiDump::Pages" at wikiprep.pl line 390.
...
I'm doing a research project for the summer and I've got to use get some data from Wikipedia, store it and then do some analysis on it. I'm using the Wikipedia API to gather the data and I've got that down pretty well.
What my questions is in regards to the links-alllinks option in the API doc here
After reading the description, both th...
how can i get wikipedia data using yahoo query api?
...
how can i get nearest top tourist destinations by giving latitude and longitude value using wikipedia api
...
jQuery.ajax(
{
url:'http://en.wikipedia.org/wiki/Football',
type:'get',
dataType:'jsonp',
success:function(data){alert(data);},
}
i want to read wikipedia page from my domain using jQuery, iam doing as above.
as expected wikipedia is sending data as pure html, but when we use $.ajax to get cross doma...
hello,
we have to mediawiki (like wikipedia) installation one is public and one is internal
we do the normal work with the internal adding/changing/deleting text articales and pictures
we want to sunc and update the external one on weekly bases, what is the best approach?
note: we are using 2 windows servers (but willing to change to...
I'm implementing the shunting-yard algorithm. I'm having trouble detecting when there are missing arguments to operators. The wikipedia entry is very bad on this topic, and their code also crashes for the example below.
For instance 3 - (5 + ) is incorrect because the + is missing an argument.
Just before the algorithm reaches the ), t...
If I have the url to the page, how would I get obtain the infobox on the right using mediawiki webservices?
...
I'd like a way to download the content of every page in the history of a popular article on Wikipedia. In other words I want to get the full contents of every edit for a single article. How would I go about doing this?
Is there a simple way to do this using the Wikipedia API. I looked and didn't find anything the popped out as a si...
I am looking to do some text analysis in a program I am writing. I am looking for alternate sources of text in its raw form similar to what is provided in the Wikipedia dumps (download.wikimedia.com).
I'd rather not have to go through the trouble of crawling websites, trying to parse the html , extracting text etc..
...
Hello,
I have this very simple python code to read xml for the wikipedia api:
import urllib
from xml.dom import minidom
usock = urllib.urlopen("http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500")
xmldoc=minidom.parse(usock)
usock.close()
print xmldoc.toxml()
But this code returns with ...
Intro:
I am a BI addict and would like to develop a project to drill-down Wikipedia's data.
I would write scripts to extract data from dbpedia (probably beginning by people articles) and load it into a people table.
My question is:
Has anyone done this before?
Even better, is there a community dedicated to this?
If it the scripts are so...
Hi Everyone,
I would like to convert a wikipedia content extracted with API to plain text.
Any tip??
...
Hi,
I want most frequent words in english. Basically, I am processing wikipedia text and am stuck with lot of words even after removing stop words. I tried googling for frequent words, but got the below link.
http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists#English
I have to manually scrape the data from these link. Is there a...
I'm looking to incolude code from Wikipedia articles in production software, and also quote it in a publication. Will this affect the licensing of either the code or the publication?
My suspicion is that Creative Commons Attribution/Share-Alike License may apply (confusingly) to the code, but not to the publication as here the code is ...
Can someone please either confirm or correct this Wikipedia algorithm for computing the first principal component? I want a simple implementation of PCA in D, which doesn't have any existing libraries for PCA AFAIK. I've tried implementing this, and it doesn't seem like my results on simple examples match stuff I get from R or Octave. ...
Hi,
I want to use wikipedia dump for my project. The below information is required for my project.
For an wikipedia entry, I want to know which other language contain the page?
I want an downloadable data in csv or other common format.
Is there a way to get this data?
Thanks
Bala
...
Hi,
I want to get a list of all the wikipedia categories. I can find them here : http://en.wikipedia.org/wiki/Special:Categories Is there a way to download all of them in xml/csv format.
Thank you
Bala
...