[maemo-community] [maemo2midgard] Wiki migration

From: Dave Neary bolsh at gnome.org
Date: Thu May 15 21:46:50 EEST 2008
Hi all,

I've spent the day working on a strategy for migrating the wiki from
midgard/markdown to MediaWiki, and with some help from Niels, I've got a
proposal (not quite complete yet, but not far).

First off, I think we should split the steps of wiki clean-up and
migration. Ideally, the migration should be mostly or completely
automated, and the cleap-up should be either pre-processing or

The wiki clean-up will continue to live in
https://maemo.org/community/wiki/wikireorg/ - I have a small team of
volunteers who are prepared to help me with the wiki clean-up, and
that's a decent place for us to start already.

So, without further ado, here's the migration plan for midgard to mediawiki:


• Need a nice way to generate a list of pages
• Need a nice way to convert midgard wiki text (Markdown) to mediawiki
• Should probably do wiki clean-up work before migration

Converting a page from Markdown to MediaWiki:

1. Install pandoc http://johnmacfarlane.net/pandoc/ (apt-get install
pandoc on Ubuntu 8.04)
2. Install HTML WikiConverter with MediaWiki dialect
• apt-get install libhtml-wikiconverter-perl
• perl -MCPAN -e 'install HTML::WikiConverter::MediaWiki'
3. Download the text of the page (for the purposes of the test:
• Page->Edit
• copy wikitext
• paste into a text file locally (HowDoIBecomeRoot.txt)
• NEW! Thanks Niels wget
http://maemo.org/community/wiki/source/HowDoIBecomeRoot/ -O
4. Convert to HTML with pandoc
• pandoc -f markdown -t html -o HowDoIBecomeRoot.html HowDoIBecomeRoot.txt

Note: The markdown2html step isn't 100% reliable. Lines starting with "
   #!/bin/bash" get turned into h1s.

5. Convert from HTML to MediaWiki with WikiConverter:
• html2wiki --dialect MediaWiki --encoding iso-8859-1 \
            --base-uri http://wiki.maemo.org/ \
            --wiki-uri http://wiki.maemo.org/ \
            HowDoIBecomeRoot.html > HowDoIBecomeRoot.wiki
6. Create the page in MediaWiki
7. Upload the wiki text to the Mediawiki page

This takes about 15 minutes *per page* because of all of the hassel of
reading the source in midgard and creating the page in mediawiki.

Mass export from midgard:
1.Get list of all wiki pages
• Use the admin interface to query all wiki objects
• Copy & paste filenames to a file
2. while read wikipage; do
	wget http://maemo.org/community/wiki/source/${wikipage}/ -O ${wikipage}.txt
	pandoc -f markdown -t html -o ${wikipage}.html ${wikipage}.txt
	html2wiki --dialect MediaWiki --encoding iso-8859-1 \
            --base-uri http://wiki.maemo.org/ \
            --wiki-uri http://wiki.maemo.org/ \
            ${wikipage}.html > ${wikipage}.wiki

Mass import into wikipedia:

See  http://meta.wikimedia.org/wiki/Help:Export and

We need to generate an XML file like this from all the pages:
http://meta.wikimedia.org/wiki/Help:Export#Example - this is the one big
remaining TODO.

See http://meta.wikimedia.org/wiki/Help:Import for importing the XML
file. There's an option to turn on in MediaWiki.

Comments? Suggestions? Improvements?


Dave Neary
GNOME Foundation member
bolsh at gnome.org

