ansaurus

Question

How to parse/extract data from a mediawiki marked-up article via python

Answer 1

+2 A:

The mwlib Markup Parser generates a semantic parse tree from MediaWiki Markup. This empowers developers to process the vast amount of information available in arbitrary MediaWikis.

The documentation page has a one-liner example:

from mwlib.uparser import simpleparse
simpleparse("=h1=\n*item 1\n*item2\n==h2==\nsome [[Link|caption]] there\n")

If you want to see how it's used in action, see the test cases that come with the code. (mwlib/tests/test_parser.py from git repository):

from mwlib import parser, expander, uparser
from mwlib.expander import DictDB
from mwlib.xfail import xfail
from mwlib.dummydb import DummyDB
from mwlib.refine import util, core

parse = uparser.simpleparse

def test_headings():
    r=parse(u"""
= 1 =
== 2 ==
= 3 =
""")

    sections = [x.children[0].asText().strip() for x in r.children if isinstance(x, parser.Section)]
    assert sections == [u"1", u"3"]

Also see Markup spec and Alternative parsers for more information.

eed3si9n 2009-12-28 05:47:56

I've looked at mwlib before. Can't seem to find some snippets of it actually in use though, which is the main problem. I'd appreciate any links to tutorials/examples.

Nazarius Kappertaal 2009-12-28 06:07:32

ansaurus

tags:

views:

answers:

How to parse/extract data from a mediawiki marked-up article via python

related questions