hello,
I am trying to get the summary of an article and download it as a string. This works great with some articles, but the wikipedia website is inconsistent. So NSScanner fails pretty often while it works fine for other articles.
Here's my NSScanner implementation:
NSString *separatorString = @"<table id=\"toc\" class=\"toc\">"; ...
I have problem In wikipedia api
I use this php script
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load("http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=New_York_Yankees&rvprop=content&format=xml");
print $xmlDoc->saveXML();
?>
& I have this result in browser .... why?
Warning:
DOMDocum...
Hi All,
I am trying to split a large XML file into smaller files using java's SAXParser (specifically the wikipedia dump which is about 28GB uncompressed).
I have a Pagehandler class which extends DefaultHandler:
private class PageHandler extends DefaultHandler {
private StringBuffer text;
...
@Override
public void startEl...
Hi,
I am currently doing a project on person name disambiguation. The idea behind the project, that it will be able to identify the correct person, when there are multiple people with the same name. I have used wikipedia for this. I want to evaluate my project on some standard data. I am looking for some testing data. I am not familiar ...
Folks,
Anybody knows how Wikipedia or MediaWiki in general, encode the URI according to the title? It's not normal URI encoding, " "s are replaced with "_"s and single quotations are not encoded and things like that. Any reference on that?
Cheers
Parsa
...
Hi,
I downloaded wikipedia dump and want to convert from wiki format to my object format. Is there a wiki parser available that converts the object into xml.
Thank you
...
Hello,
I am trying to build a wikipedia link crawler on google app engine. I wanted to store an index in the datastore. But I run into the DeadlineExceededError for both cron jobs and task queue.
for the cron job I have this code:
def buildTree(self):
start=time.time()
self.log.info(" Start Time: %f" % start)
nobranches...
Hi,
I want to use WikipediaTokenizer in lucene project - http://lucene.apache.org/java/3_0_2/api/contrib-wikipedia/org/apache/lucene/wikipedia/analysis/WikipediaTokenizer.html But I never used lucene. I just want to convert a wikipedia string into a list of tokens. But, I see that there are only four methods available in this class, end...
I am trying to parse a wikitext file received through Wikipedia's API and the problem is that some of its templates (i.e. snippets enclosed in {{ and }}) are not automatically expanded into wikitext, so I have to manually look for them in the article source and replace them eventually. The question is, can I use regex in .NET to get the ...
I want to make a Wikipedia Reader for the iPhone. What's the best approach?
I've already made a few thought about that. Loading the content of the Wikipedia site is quite easy using the Wikipedia API.But the difficulty is how to display the content in a nice way. The content is marked up with wikipedia tags, not html. My idea is to pars...
hi,
What would be the easiest way to get all articles about people from wikipedia? I know I can download a dump of all the pages, but then how do I filter those and get only the ones about people? I need as many as I can get (preferably more than a million) so using any sort of API is probably not an option.
...
Is it possible to query the wikipedia API for articles that contain a specific Template? The docs at: http://en.wikipedia.org/w/api.php do not describe any action that would filter search results to pages that contain a template. Specifically, I am after pages that contain Template:Persondata. After that, I am hoping to be able to retrie...