Wikipedia Offline Readers

Looks like all the Wikimedia Foundation had to do for decent offline reader software to be developed is continue to provide database dumps. 😉 There are now several implementations, some open source, that can be used to build Wikipedia DVDs – and I’m not referring to the neat offline reader hack that was just slashdotted. Look at these:

  • Moulin (open source) uses static HTML inside a XUL-based cross-platform reader application with Gecko as the rendering engine. Doesn’t seem to have full-text search (only titles), but seems to have a very active development team. Current downloadable version still very simplistic, future versions should be interesting. Current versions do not contain images but there’s nothing technical that stands in the way of including them. I missed the Wikimania talk about this one. :-(
  • Kiwix (open source) is awesome and the slickest implementation I’ve seen so far. It was used for the Wikipedia 0.5 DVD (actually a CD, with only about 2000 articles, sadly). Has a nice full-text search, search autocompletion, and printing. Also uses static HTML as a source. Storage efficiency could be better, but this first selection does include image thumbnails, which take quite a bit of space.
  • Ksana For Wiki is still closed source. It was demonstrated at Wikimania to provide “Wikipedia on a USB stick”. Pretty nifty for looking things up without a net connection. The application actually parses the wikitext and does a fairly shoddy job at it, which makes many of the articles look rather raw. On the positive side, it does support accessing dumps in any language, has a fairly fast full-text search, and is cross-platform.
  • ZenoReader is a Windows-only closed source reader application developed for the German Wikipedia DVD. While the company which made the DVD, Directmedia, deserves credit for bringing the first WP DVD to the market, I don’t think this particular framework is likely to have much of a future. I’m not even going to bother to try to get it to run under WINE on Linux, as they suggest. From what I can gather, it’s based on the HTML of the de.wp articles which is served through a local webserver.
  • Wikipedia Offline Client seems to be a student project to create a nice graphical client. From what I can see quickly, it appears to be also based on rendering & indexing HTML pages, though they seem to have hacked the standard MediaWiki parser for the purpose. Not sure what the current status is and how likely it is to be developed further. It appears to be partially based on Knowledge, an earlier offline reader effort.
  • WikiFilter takes a similar approach to Ksana, using the wikitext as a source. Judging by the screenshot, the output is somewhat slicker, but the code hasn’t been updated in more than a year and is Windows-only. It runs as an Apache module so setup is definitely not for the meek.

UPDATE: A couple of other ones pointed out in the comments:

  • yawr is Magnus Manske’s effort to create an open source equivalent of ZenoReader.
  • WikiMiner is a Java-based search tool that can be used in conjunction with the static HTML dumps.

A few other methods to view Wikipedia without the Internet exist, such as a reader for the iPod or Erik Zachte’s TomeRaider edition. TomeRaider is a proprietary ebook reader format for PDAs. Erik explained to me how he spent countless hours trying to get every last detail to render correctly.

Perhaps the WMF should pick one of those platforms and support the developers, offer a DVD toolchain on, etc. My long term wishlist for offline reading includes:

  • “Make your own dump” style scripts that generate input files for the reader application which include exactly the articles & images I want, so it becomes easy to customize it down to a megabyte-size selection, or to access many gigabytes of text and
  • More than one-article-per-window display modes. It should be possible to scroll through an entire category, or even the entire encyclopedia, without ever opening a new window. Google Reader or Thoof style smart loading may help here.
  • Embedded Theora & Vorbis playback. If Grolier did it 15 years ago, we should be able to have a rich media DVD as well. :-)
  • Smarter parsing of the contents. Templates in particular typically mark up semantic blocks that you may want to filter out, match to an offline equivalent, render in a separate window, etc. Of course if we want to really dream, think of the possibilities of DBpedia style data extraction and queries: go beyond full-text search and offer limitless queries & dynamic lists of the data within Wikipedia.

Of course the real challenge in the long run will be off-line editing with syncing to the live side once connectivity is available.
And I’d love to see decent enough voice recognition on mobile devices so that you can simply say the name of an article and it will immediately display it. 😉

Going back to the boring present, are you aware of other wiki reader & parser projects that are worth mentioning? & paying for free culture

Micropledge is a new platform for pooling resources to develop software. Users can pledge money towards the development of a specific project; the money is only paid if the pledgers vote that the project has been successfully implemented. Note that you have to transfer money to Micropledge before you can pledge it towards any specific project; this largely eliminates the risk of pledge fraud, but also reduces the likelihood of spontaneous pledges.

I’ve started an example Micropledge for a MediaWiki extension which I would consider very useful, an RSS extension for namespaces with smart quality filtering.

Micropledge is part of a growing number of sites and services that combine Web 2.0 style social networking and slick UIs with mechanisms for fundraising and pledging towards specific goals. Pledgebank is a universal pledging service (without built-in payment processing), whereas Fundable is a platform for goal-oriented fundraising. I’ve blogged before about, which tries to connect people concerned about certain causes with non-profit organizations that relate to them. When it comes to widgets, ChipIn makes it easy to embed dynamic fundraising boxes into any website. And there are a number of Facebook applications as well.

Of course, free culture does not mean that people do not get paid; it means that the cultural works people create are not encumbered by monopoly rights. Distributed funding mechanisms are one of many ways in which people can and do get paid for authoring works which are freely available to everyone, in perpetuity. It remains to be seen which ones of these new services will be successful in the long run. I’d also love to see some pilot projects in the area of content development on Wikimedia Foundation projects.

Beyond usability, one key question seems to be: Why would people visit a pledging platform in the first place? It seems clear that many people would do so in order to start a pledge, but how do you get people there to join an existing effort? Wikipedia and eBay could gain popularity because they offer things people want: information or goods/services. It seems much harder to match people searching for a particular application to the relevant pledge on

Instead of trying to generate attention for hundreds of small pledges, I suspect that it may be more effective to focus attention on a broader cause, and to let an interested core community decide how the pooled resources can be used in service of that cause — especially if you have some credibility from prior endeavors. Campaigns like “Let’s create a world-class open source game” or “Let’s massively improve the state of open source drivers for graphics hardware”, if backed by a credible non-profit organization like the FSF, might motivate many people to give without requiring individual donors to think too much about every single step it takes to achieve the larger goal.

A real-world example of this model is Project Peach, an open source / free content 3D animated movie project by the good folks behind Blender. People who want to see the film done can pre-order the DVD; those who want to get involved in the details are also encouraged to do so. Having already successfully produced one open source movie, Elephants Dream, the Blender folks have the credibility to pull it off again. My only criticism of the project is that it does not seem to aim significantly higher than the previous one.

That said, the Micropledge model might still work very well for solving very specific problems that would never be addressed under the umbrella of a larger initiative, provided that the instigator of a pledge manages to network with those who have the same problem.

Interesting historical perspective: An Economy for Giving Everything Away.

Read Rice Boy

Surreal brilliance.

I dream of the day the best webcomics are turned into open source movies. 😉

Piqs: CC-BY Photo Repository

Bryan Tong Minh points out that his cool Flickr/Wikimedia Commons Upload Tool now also supports Piqs , which is a database of CC-BY licensed photographs. It’s really good to see the proliferation of free content licenses as a default for user uploads.

Wikipedia’s core problem is not expertise, it’s self-selection

Bringing Wikipedia articles up to a quality standard we can be proud of will require more than just “stable versions” (frozen revisions that community members claim to be of a given quality standard). Take the article on Mitt Romney, one of the many people hoping to become the next president of the United States. The article describes Romney’s record as governor of Massachusetts with the following words:

Romney was sworn in as the 70th governor of Massachusetts on January 2, 2003, along with Lieutenant Governor Kerry Healey. Within one year of taking office, Romney eliminated a 3 billion dollar budget deficit. During this time he did not raise taxes or debt. He also proceeded to end his term with a 1 billion dollar surplus as well as lower taxes and a lower unemployment rate.

All this information is properly referenced and sourced to … Romney for President, Inc. Of course, the article will eventually become more sane, but this is the state it’s been in for weeks, and this is what we currently serve readers looking for information about this particular candidate. And it’s quite likely that such a revision would at least have been approved as “non-vandalized” under a stable version system.

Yet, is the answer to give up on the idea of radically open editing? The source of the problem here seems to be not so much that “anyone can edit”, but that the people who do edit are self-selected. And for many topics, self-selection leads to bias. Whether it’s Mormons writing about Mormonism, Pokemon lovers writing about Pokemon characters, or teenage Mitt Romney supporters writing about Mitt Romney, the problem shows up on thousands of topics. Sometimes different self-selected factions counter each other’s bias, but that is obviously not something one can rely on, especially when one faction wins a particular war of attrition.

Putting stronger emphasis on professional expertise will not address this problem, and indeed, one will find examples of the same self-selection bias in more expert-driven communities like Citizendium (e.g. an article on chiropractic largely written by a chiropractor). All one can hope for from self-selected experts is that their bias is more intelligently disguised. Are volunteer communities doomed to self-selection bias? Well, dealing with the problem requires first recognizing it as such. And currently recognition of the problem on Wikipedia is very limited. Indeed, suggestions of self-selection bias are usually countered with replies such as “judge the article, not the authors”, often followed by reference to the “no personal attacks” policy. Outside clear commercial interests, Wikipedians are ill-prepared to deal with their own bias.

It also seems clear that a broad recusal & disclosure policy that would extend the current “conflict of interest” guidelines would go too far. Firstly, it would simply lead to much self-selection bias being hidden from view: The editor promoting Romney’s campaign on MySpace would simply remove the reference to that MySpace page from their userpage. Secondly, biased or not, self-selected editors will often be the best-informed about a particular subject. Rather than trying to remove them from the set of editors working on a particular article, it generally seems wiser to broaden the set to include more independent voices.

I believe we need to think of this as a socio-technical problem: How do we get a large number of relatively random, but highly trusted contributors to carefully look at a particular article and to scan for bias? Clearly, NPOV dispute tags aren’t sufficient: POV fighters will have an interest in removing them as soon as possible, and given the sheer number of them, they no long serve as sufficient motivation for the average editor. Furthermore, the articles which people choose to “fix” are again highly self-selected.

As just one possible alternative, imagine that some trusted (elected?) group of users could flag articles for “bias review”. They would set a number of people from 10 to 100 who would be randomly selected from the pool of active editors. Those people would get a note: “The article XY has been flagged for bias review. You have been randomly selected as a reviewer. Do you accept?” If the user does not accept, the review notice would automatically be propagated to another random user. In combination with stable quality versions, this could help to get many independent voices to look for obvious signs of bias. One might also consider encouraging the development of article forks by separate workgroups, and letting readers decide (by discussion or vote) which one is the least biased.

Do you have other ideas? Whatever the solution, I do believe that we need to start thinking seriously about the problem if we want Wikipedia to be useful in any area of “contested knowledge”. And we need to start experimenting, rather than waiting endlessly for a consensus that will never come. Right now, thousands of contested articles are dominated by factions fighting POV wars of attrition. That cannot be the final answer.

Wikimedia Board Election 2007 – The Last Hours

As many of you know, three seats on the Wikimedia Foundation’s Board of Trustees are up for re-election. The polls will close in about 6 hours. I was elected last year to replace Angela Beesley mid-term, so my term lasted only 9 months. I’ve written a summary of my experience here; my main candidate statement is here. I would like to continue the work I have started and would appreciate your support. I have also endorsed Kat Walsh and Oscar van Dillen, whose seats are also up for re-election; I would be honored to continue to serve alongside them.

Whether you support me and the other incumbents or not, I would also like you to consider voting for the following people (you can vote for as many people as you like):

  • Kim Bruning – a biologist and software programmer with strong experience as a community mediator and analyst. Full disclosure: I am working with Kim on the OmegaWiki project. This has also allowed me to get to know him personally and understand the way he thinks; while I found his candidate statement this year somewhat weak, I would encourage you to specifically take a look at his Q&A page. If you want someone on the Board who cares deeply about the community and who is likely to bring innovation and change, please consider voting for Kim.
  • Michael Snow – a lawyer and long-time Wikipedian who started the Wikipedia Signpost and chairs the Communications Committee. He has worked directly with the Board on many occasions and would complement the Board’s skills well with his own. He will take clear positions but defend them in a calm and reasonable fashion. If you want less wiki-drama and more legal expertise on the Board-level, consider voting for Michael.
  • Steve Dunlop (UninvitedCompany) – a manager and musician who also stood in last year’s election. His frustration with progress in WMF and the state of the organization shine through his presentation and Q&A; I think the general direction he recommends is the right one, but his views are colored by an unavoidable information deficit. I disagree with his belief that projects like Wikisource and Wikinews should be “spun off” into separate organizations and consider his views on non-profit governance a little too traditionalist; at the same time, I would value this additional voice at the Board table. If you want someone who will shake things up a little and push for structural and organizational changes, consider voting for Steve.
  • Yann Forget – a free software advocate who has worked in progressive non-profits for more than a decade. I have not a shred of doubt about his passion, honesty and integrity. Those who want someone with deep community roots and a strong commitment to progressive values on the Board who will speak his mind openly should consider voting for Yann.

These are the candidates I feel comfortable supporting; I will not comment on the remaining ones. If you are qualified to vote and haven’t done so, please log into your “home Wikimedia project” and visit the “Special:Boardvote” page that should be linked from the sitenotice.

On the Map with Avi Lewis

Foreign affairs analysis that doesn’t suck.

Recent shows can be found on One Big Torrent.

Terrorist idiots

Portrait of the Modern Terrorist as an Idiot
by Bruce Schneier:

So these people should be locked up … assuming they are actually guilty, that is. Despite the initial press frenzies, the actual details of the cases frequently turn out to be far less damning. Too often it’s unclear whether the defendants are actually guilty, or if the police created a crime where none existed before.

The JFK Airport plotters seem to have been egged on by an informant, a twice-convicted drug dealer. An FBI informant almost certainly pushed the Fort Dix plotters to do things they wouldn’t have ordinarily done. The Miami gang’s Sears Tower plot was suggested by an FBI undercover agent who infiltrated the group. And in 2003, it took an elaborate sting operation involving three countries to arrest an arms dealer for selling a surface-to-air missile to an ostensible Muslim extremist. Entrapment is a very real possibility in all of these cases.


Gapminder is a highly useful tool for visualizing and analyzing human development statistics.

Wikimedia Brand Survey

The Wikimedia Foundation, which runs Wikipedia, has a jungle of brands: logos, project names, and corresponding domain names. For many of the project names, there are even localized variants in different languages. But even the basic names are confusing, such as the Wikimedia/Wikipedia similarity.

With that leading introduction ;-), I’d like to invite you to complete the Wikimedia brand survey if you are interested in such matters & find the time.