Wikimania & Advisory Board thoughts

One of the greatest things about the Wikimedia community is its annual conference, Wikimania. With a new location chosen every year, it gives interested Wikimedians an opportunity to meet like-minded wiki nuts from all over the planet. I have attended all the conferences so far, including this year’s in Taipei (August 3-5).
The Foundation’s Board of Trustees used the opportunity of Wikimania 2007 for the first-ever meeting with many members of its Advisory Board. This is a belated summary of my impressions from both events.

Advisory Board meeting

Angela, who is the Chair of the Advisory Board, is still working on a detailed report, so I will keep this one short. The following Advisory Board members managed to attend: Ward Cunningham, Heather Ford, Melissa Hagemann, Teemu Leinonen, Rebecca MacKinnon, Wayne Mackintosh, Benjamin Mako Hill, Erin McKean, Achal Prabhala, and Raoul Weiler. We met in a small room at the Chien Tan Oversea Youth Activity Center, the main conference venue.

The two-day meeting was facilitated by Manon Ress; the Agenda is publicly available. I will say this much upfront: The single most important function of this meeting was for Board and Advisory Board members to get to know and trust each other, and to figure out how we can actively work together in the future. I believe the meeting reached this goal. For many Advisory Board members, visiting Taipei also must have been the equivalent of drinking from a firehose of knowledge about wikis. Of course there are exceptions. 😉

Naturally, there were also some very specific exercises which will hopefully have practical use. For instance, randomized groups tried to identify the key values of the Foundation. The group I was in started out by humorously defining all the things we don’t want to be — extremely hierarchical, exclusive, western-centric, etc. — and then compared those with positive value statements. (For some reason, “world domination” ended up in both lists.) I suggested the slogan “knowledge without borders” or “knowledge without boundaries” as a possible framework for many of the key values we found: access to knowledge (in a participatory sense) on a global scale, multilingual and multicultural diversity, content that can be freely shared and modified, etc.

I don’t know if this particular slogan will catch on, but I like the idea of trying to express key principles in a short catchphrase. The list of values itself could also be useful for messaging and policymaking. There wasn’t a lot of notetaking on the wiki, so I hope Angela has some notes from the different groups available to her. We also tried to identify some key goals, and the list my group worked on (with some predictable disagreements about the meaning of “goal” etc.) included primarily these items, read through my own personal bias:

  • Increasing quality of content in our projects, but also challenging and changing misperceptions.
  • Massively scaling volunteer participations in all areas of organizational and project-level work, to live up to our ambitious mission & vision.
  • Building greater awareness of Wikimedia’s mission & purpose. (Gregory Maxwell recently proposed adding an essay I started, 10 things you did not know about Wikipedia, to the notice displayed to unregistered users of Wikipedia. I think that’s an interesting experiment in changing people’s perceptions — we should use our own website properties more often to actually communicate with our readers.)
  • Build capacities among partners and users — the ability to participate, to create and deploy new tools, and so on.

I believe that there is a deep and complex challenge of what I call “meta-management” — the Foundation has such a diversity of projects (Wikibooks, Wikinews, Wikisource, etc.) and goals that any approach which does not scale massively will not serve our community well. So, yes, we should of course hire coordinators for grants and projects, and get better at business development, and improve our technical infrastructure, and so forth. But I think networking and empowering volunteers to do many of the things we hope to pay more people to do is a much more scalable approach.

This is a very difficult idea to promote in a group of highly intelligent people, as it’s much more exciting to focus on more specific problems, so I’m not sure if I got this particular point across well in our discussion.

Unsurprisingly, then, I also found the biggest practical value in an exercise where I moderated a group on the topic of Volunteerism (link goes to the notes), consisting of Angela, Mako, and Achal. I was especially intrigued to hear more about Ubuntu’s process for creating and coordinating volunteer teams. I am left with the conclusion that we need more semi-formal ways for Wikimedians to self-organize than the heavy process of starting a chapter. Check the notes for some other interesting ideas. We got some more suggestions later, such as an “Edit Wikipedia Day” and other online events that could be held every year to encourage different types of participation.

It’s a bunch of people in a room. It doesn’t get much more exciting than that.

Photo: halafish, CC-BY-SA.

I missed some of the more creative and physical elements that we used during the Frankfurt Board+Chapter Retreat which happened last year. For example, our facilitator there had an interesting exercise where she asked all participants to come up with a gesture to identify themselves (I used Columbo’s famous “Just one more thing”). While a bit repetitive in the end, I thought it was a fun bonding game that also helps very practically to remember people, faces and names. This makes me think that it would be good to have someone in-house for facilitation work, to build upon knowledge from previous events. Also, is there a wiki to document these kinds of processes? 😉

In practice, we will continue to use our Advisory Board mailing list to consult with A.B. members as a group — these are typically strategic questions, or “fishing” of the type “Does anyone know someone who ..”. But I believe the more frequent interactions will be with individuals, around issues in their domain. And while I initially viewed the A.B. as only being truly connected to the Board, I am increasingly coming to the opinion that we should encourage them to interact with staff and community members as well (and possibly sometimes chapters, though I hope these will also set up their own advisory bodies).

What is the ideal size for an Advisory Board? During the meeting I believe the consensus was against significant further expansion before we’re happy with the utilization of the current Advisory Board (this is not the case for the actual Board, where there is a consensus leaning towards another expansion). I suspect there’s another reason to keep it roughly at the present size: it allows us to get to know the members socially, to form a collegial trust relationship, which can lead to very different types of useful interaction than merely someone who you suspect to know something. It also keeps it manageable to prune members, to invite them into a single place, and so forth.

At the same time, if you think about expansion because “more is better”, then any size would be too small — you’ll want to manage knowledge and trust on a global scale. Wait a minute, knowledge and trust on a global scale? That sounds like a familiar problem! 😉 I suspect indeed that innovations of internal knowledge management will be driven by our project communities. And I don’t think that a massively decentralized approach of acquiring information from trustworthy sources and a fairly stable group of passionate advisors would be mutually exclusive. :-)

Wikimania itself

Wikimania visitors posing (being posed) for some group shots on the last day.

Photo: halafish, CC-BY-SA.

Most members of the fantastic team that made it all happen!

Photo: halafish, CC-BY-SA.

The main conference was absolutely wonderful. We cannot thank the organizers enough for putting together an event that, I think, nobody who was there will soon forget. I will have to resort to the maligned bullet point list to even begin to enumerate all the things that were done well:

  • a well-chosen venue (a youth hostel) with plenty of spaces to mingle
  • a large number of sponsorships that never appeared obtrusive in any way
  • a highly committed local team that went out of its way to assist with anything (starting with welcoming people at the airport)
  • a compelling program with talks that were truly interesting to any Wikimedian
  • many opportunities for ad hoc events (laptop content bundles, lightning talks, workshops, and so forth)
  • side events – citizen journalism, hacking days, party, etc.
  • excellent catering
  • Taipei itself
  • all of you who made it 😉
  • and a million other things.

You owe it to yourself to come to Wikimania 2008, which is currently accepting city bids. Swim if you have to. And block the first week of August in your calendar. :-)

I do think we should try to have more content that appeals to wiki newbies next time: editing workshops, project tours, exhibits, etc. Whether that’s the intent or not, many people who have barely seen an edit page will always be inclined to visit a conference like this — just because it’s about this crazy new wiki thing. That’s doubly true if the conference is in a location where the community isn’t yet as strong as in the U.S., parts of Asia, or Western Europe.

Some of the first users of the OLPC at Wikimania. The laptop had a very prominent place in the “free culture space” of the conference.

Photo: preetamrai, CC-BY-SA.

I did enjoy Taipei itself, especially a fun little tour with Shun-ling Chen, Mel from OLPC, and the Semantic MediaWiki developers. There is also some incriminating video evidence from another occasion that Kat will probably use against me sooner or later. I derived the greatest enjoyment from making new friends, having interesting conversations, and discovering new patterns (in reality in general and Wikipedia in particular). In that respect, I especially cherish the new things I learned from people like Luca de Alfaro (trust and reputation in Wikipedia), Michael Dale (Metavid), Shay David (Kaltura), and Brian Mingus (quality heuristics – let’s chat some more about this soon). I think all Board members had great conversations with Sue Gardner, our new “Special Advisor”, and Mike Godwin, our new Legal Counsel. And of course, it was great to connect again and catch up with many old friends in an unlikely location.

There was definitely a language barrier to connect more with the local folks. English isn’t that commonly spoken in Taiwan, and I found it difficult to converse much beyond smalltalk. Not much that can be done about that other than learning Chinese, which I’m afraid is unlikely to make my to-do list anytime soon. I tried to be accessible to anyone who did want to speak with me and gave an interview to a local magazine about OmegaWiki.

I will find it very interesting to look back on this Wikimania in context, and to hear more from others about it. I for one think it was a complete success. But I felt the same way about Boston and Frankfurt, so I hope there will also be some constructive criticism and maybe even some trolling. 😉 I’m also keen to see more wiki-events small and large. I won’t be able to make it to all or even most of them, but that’s OK. One way or another, it is wonderful to see the global community for free culture thrive. As a community, as friends; constructive in conflict, united in diversity.

Delphine and me are sharing a precious moment. “I ate what?!” I’m sure you can come up with a funnier caption for this one. I really have no memory of what actually happened. :-)

Photo: Kat Walsh, CC-BY-SA.

Wikipedia Offline Readers

Looks like all the Wikimedia Foundation had to do for decent offline reader software to be developed is continue to provide database dumps. 😉 There are now several implementations, some open source, that can be used to build Wikipedia DVDs – and I’m not referring to the neat offline reader hack that was just slashdotted. Look at these:

  • Moulin (open source) uses static HTML inside a XUL-based cross-platform reader application with Gecko as the rendering engine. Doesn’t seem to have full-text search (only titles), but seems to have a very active development team. Current downloadable version still very simplistic, future versions should be interesting. Current versions do not contain images but there’s nothing technical that stands in the way of including them. I missed the Wikimania talk about this one. :-(
  • Kiwix (open source) is awesome and the slickest implementation I’ve seen so far. It was used for the Wikipedia 0.5 DVD (actually a CD, with only about 2000 articles, sadly). Has a nice full-text search, search autocompletion, and printing. Also uses static HTML as a source. Storage efficiency could be better, but this first selection does include image thumbnails, which take quite a bit of space.
  • Ksana For Wiki is still closed source. It was demonstrated at Wikimania to provide “Wikipedia on a USB stick”. Pretty nifty for looking things up without a net connection. The application actually parses the wikitext and does a fairly shoddy job at it, which makes many of the articles look rather raw. On the positive side, it does support accessing dumps in any language, has a fairly fast full-text search, and is cross-platform.
  • ZenoReader is a Windows-only closed source reader application developed for the German Wikipedia DVD. While the company which made the DVD, Directmedia, deserves credit for bringing the first WP DVD to the market, I don’t think this particular framework is likely to have much of a future. I’m not even going to bother to try to get it to run under WINE on Linux, as they suggest. From what I can gather, it’s based on the HTML of the de.wp articles which is served through a local webserver.
  • Wikipedia Offline Client seems to be a student project to create a nice graphical client. From what I can see quickly, it appears to be also based on rendering & indexing HTML pages, though they seem to have hacked the standard MediaWiki parser for the purpose. Not sure what the current status is and how likely it is to be developed further. It appears to be partially based on Knowledge, an earlier offline reader effort.
  • WikiFilter takes a similar approach to Ksana, using the wikitext as a source. Judging by the screenshot, the output is somewhat slicker, but the code hasn’t been updated in more than a year and is Windows-only. It runs as an Apache module so setup is definitely not for the meek.

UPDATE: A couple of other ones pointed out in the comments:

  • yawr is Magnus Manske’s effort to create an open source equivalent of ZenoReader.
  • WikiMiner is a Java-based search tool that can be used in conjunction with the static HTML dumps.

A few other methods to view Wikipedia without the Internet exist, such as a reader for the iPod or Erik Zachte’s TomeRaider edition. TomeRaider is a proprietary ebook reader format for PDAs. Erik explained to me how he spent countless hours trying to get every last detail to render correctly.

Perhaps the WMF should pick one of those platforms and support the developers, offer a DVD toolchain on, etc. My long term wishlist for offline reading includes:

  • “Make your own dump” style scripts that generate input files for the reader application which include exactly the articles & images I want, so it becomes easy to customize it down to a megabyte-size selection, or to access many gigabytes of text and
  • More than one-article-per-window display modes. It should be possible to scroll through an entire category, or even the entire encyclopedia, without ever opening a new window. Google Reader or Thoof style smart loading may help here.
  • Embedded Theora & Vorbis playback. If Grolier did it 15 years ago, we should be able to have a rich media DVD as well. :-)
  • Smarter parsing of the contents. Templates in particular typically mark up semantic blocks that you may want to filter out, match to an offline equivalent, render in a separate window, etc. Of course if we want to really dream, think of the possibilities of DBpedia style data extraction and queries: go beyond full-text search and offer limitless queries & dynamic lists of the data within Wikipedia.

Of course the real challenge in the long run will be off-line editing with syncing to the live side once connectivity is available.
And I’d love to see decent enough voice recognition on mobile devices so that you can simply say the name of an article and it will immediately display it. 😉

Going back to the boring present, are you aware of other wiki reader & parser projects that are worth mentioning? & paying for free culture

Micropledge is a new platform for pooling resources to develop software. Users can pledge money towards the development of a specific project; the money is only paid if the pledgers vote that the project has been successfully implemented. Note that you have to transfer money to Micropledge before you can pledge it towards any specific project; this largely eliminates the risk of pledge fraud, but also reduces the likelihood of spontaneous pledges.

I’ve started an example Micropledge for a MediaWiki extension which I would consider very useful, an RSS extension for namespaces with smart quality filtering.

Micropledge is part of a growing number of sites and services that combine Web 2.0 style social networking and slick UIs with mechanisms for fundraising and pledging towards specific goals. Pledgebank is a universal pledging service (without built-in payment processing), whereas Fundable is a platform for goal-oriented fundraising. I’ve blogged before about, which tries to connect people concerned about certain causes with non-profit organizations that relate to them. When it comes to widgets, ChipIn makes it easy to embed dynamic fundraising boxes into any website. And there are a number of Facebook applications as well.

Of course, free culture does not mean that people do not get paid; it means that the cultural works people create are not encumbered by monopoly rights. Distributed funding mechanisms are one of many ways in which people can and do get paid for authoring works which are freely available to everyone, in perpetuity. It remains to be seen which ones of these new services will be successful in the long run. I’d also love to see some pilot projects in the area of content development on Wikimedia Foundation projects.

Beyond usability, one key question seems to be: Why would people visit a pledging platform in the first place? It seems clear that many people would do so in order to start a pledge, but how do you get people there to join an existing effort? Wikipedia and eBay could gain popularity because they offer things people want: information or goods/services. It seems much harder to match people searching for a particular application to the relevant pledge on

Instead of trying to generate attention for hundreds of small pledges, I suspect that it may be more effective to focus attention on a broader cause, and to let an interested core community decide how the pooled resources can be used in service of that cause — especially if you have some credibility from prior endeavors. Campaigns like “Let’s create a world-class open source game” or “Let’s massively improve the state of open source drivers for graphics hardware”, if backed by a credible non-profit organization like the FSF, might motivate many people to give without requiring individual donors to think too much about every single step it takes to achieve the larger goal.

A real-world example of this model is Project Peach, an open source / free content 3D animated movie project by the good folks behind Blender. People who want to see the film done can pre-order the DVD; those who want to get involved in the details are also encouraged to do so. Having already successfully produced one open source movie, Elephants Dream, the Blender folks have the credibility to pull it off again. My only criticism of the project is that it does not seem to aim significantly higher than the previous one.

That said, the Micropledge model might still work very well for solving very specific problems that would never be addressed under the umbrella of a larger initiative, provided that the instigator of a pledge manages to network with those who have the same problem.

Interesting historical perspective: An Economy for Giving Everything Away.

Piqs: CC-BY Photo Repository

Bryan Tong Minh points out that his cool Flickr/Wikimedia Commons Upload Tool now also supports Piqs , which is a database of CC-BY licensed photographs. It’s really good to see the proliferation of free content licenses as a default for user uploads.

Wikipedia’s core problem is not expertise, it’s self-selection

Bringing Wikipedia articles up to a quality standard we can be proud of will require more than just “stable versions” (frozen revisions that community members claim to be of a given quality standard). Take the article on Mitt Romney, one of the many people hoping to become the next president of the United States. The article describes Romney’s record as governor of Massachusetts with the following words:

Romney was sworn in as the 70th governor of Massachusetts on January 2, 2003, along with Lieutenant Governor Kerry Healey. Within one year of taking office, Romney eliminated a 3 billion dollar budget deficit. During this time he did not raise taxes or debt. He also proceeded to end his term with a 1 billion dollar surplus as well as lower taxes and a lower unemployment rate.

All this information is properly referenced and sourced to … Romney for President, Inc. Of course, the article will eventually become more sane, but this is the state it’s been in for weeks, and this is what we currently serve readers looking for information about this particular candidate. And it’s quite likely that such a revision would at least have been approved as “non-vandalized” under a stable version system.

Yet, is the answer to give up on the idea of radically open editing? The source of the problem here seems to be not so much that “anyone can edit”, but that the people who do edit are self-selected. And for many topics, self-selection leads to bias. Whether it’s Mormons writing about Mormonism, Pokemon lovers writing about Pokemon characters, or teenage Mitt Romney supporters writing about Mitt Romney, the problem shows up on thousands of topics. Sometimes different self-selected factions counter each other’s bias, but that is obviously not something one can rely on, especially when one faction wins a particular war of attrition.

Putting stronger emphasis on professional expertise will not address this problem, and indeed, one will find examples of the same self-selection bias in more expert-driven communities like Citizendium (e.g. an article on chiropractic largely written by a chiropractor). All one can hope for from self-selected experts is that their bias is more intelligently disguised. Are volunteer communities doomed to self-selection bias? Well, dealing with the problem requires first recognizing it as such. And currently recognition of the problem on Wikipedia is very limited. Indeed, suggestions of self-selection bias are usually countered with replies such as “judge the article, not the authors”, often followed by reference to the “no personal attacks” policy. Outside clear commercial interests, Wikipedians are ill-prepared to deal with their own bias.

It also seems clear that a broad recusal & disclosure policy that would extend the current “conflict of interest” guidelines would go too far. Firstly, it would simply lead to much self-selection bias being hidden from view: The editor promoting Romney’s campaign on MySpace would simply remove the reference to that MySpace page from their userpage. Secondly, biased or not, self-selected editors will often be the best-informed about a particular subject. Rather than trying to remove them from the set of editors working on a particular article, it generally seems wiser to broaden the set to include more independent voices.

I believe we need to think of this as a socio-technical problem: How do we get a large number of relatively random, but highly trusted contributors to carefully look at a particular article and to scan for bias? Clearly, NPOV dispute tags aren’t sufficient: POV fighters will have an interest in removing them as soon as possible, and given the sheer number of them, they no long serve as sufficient motivation for the average editor. Furthermore, the articles which people choose to “fix” are again highly self-selected.

As just one possible alternative, imagine that some trusted (elected?) group of users could flag articles for “bias review”. They would set a number of people from 10 to 100 who would be randomly selected from the pool of active editors. Those people would get a note: “The article XY has been flagged for bias review. You have been randomly selected as a reviewer. Do you accept?” If the user does not accept, the review notice would automatically be propagated to another random user. In combination with stable quality versions, this could help to get many independent voices to look for obvious signs of bias. One might also consider encouraging the development of article forks by separate workgroups, and letting readers decide (by discussion or vote) which one is the least biased.

Do you have other ideas? Whatever the solution, I do believe that we need to start thinking seriously about the problem if we want Wikipedia to be useful in any area of “contested knowledge”. And we need to start experimenting, rather than waiting endlessly for a consensus that will never come. Right now, thousands of contested articles are dominated by factions fighting POV wars of attrition. That cannot be the final answer.

Wikimedia Board Election 2007 – The Last Hours

As many of you know, three seats on the Wikimedia Foundation’s Board of Trustees are up for re-election. The polls will close in about 6 hours. I was elected last year to replace Angela Beesley mid-term, so my term lasted only 9 months. I’ve written a summary of my experience here; my main candidate statement is here. I would like to continue the work I have started and would appreciate your support. I have also endorsed Kat Walsh and Oscar van Dillen, whose seats are also up for re-election; I would be honored to continue to serve alongside them.

Whether you support me and the other incumbents or not, I would also like you to consider voting for the following people (you can vote for as many people as you like):

  • Kim Bruning – a biologist and software programmer with strong experience as a community mediator and analyst. Full disclosure: I am working with Kim on the OmegaWiki project. This has also allowed me to get to know him personally and understand the way he thinks; while I found his candidate statement this year somewhat weak, I would encourage you to specifically take a look at his Q&A page. If you want someone on the Board who cares deeply about the community and who is likely to bring innovation and change, please consider voting for Kim.
  • Michael Snow – a lawyer and long-time Wikipedian who started the Wikipedia Signpost and chairs the Communications Committee. He has worked directly with the Board on many occasions and would complement the Board’s skills well with his own. He will take clear positions but defend them in a calm and reasonable fashion. If you want less wiki-drama and more legal expertise on the Board-level, consider voting for Michael.
  • Steve Dunlop (UninvitedCompany) – a manager and musician who also stood in last year’s election. His frustration with progress in WMF and the state of the organization shine through his presentation and Q&A; I think the general direction he recommends is the right one, but his views are colored by an unavoidable information deficit. I disagree with his belief that projects like Wikisource and Wikinews should be “spun off” into separate organizations and consider his views on non-profit governance a little too traditionalist; at the same time, I would value this additional voice at the Board table. If you want someone who will shake things up a little and push for structural and organizational changes, consider voting for Steve.
  • Yann Forget – a free software advocate who has worked in progressive non-profits for more than a decade. I have not a shred of doubt about his passion, honesty and integrity. Those who want someone with deep community roots and a strong commitment to progressive values on the Board who will speak his mind openly should consider voting for Yann.

These are the candidates I feel comfortable supporting; I will not comment on the remaining ones. If you are qualified to vote and haven’t done so, please log into your “home Wikimedia project” and visit the “Special:Boardvote” page that should be linked from the sitenotice.

Wikimedia Brand Survey

The Wikimedia Foundation, which runs Wikipedia, has a jungle of brands: logos, project names, and corresponding domain names. For many of the project names, there are even localized variants in different languages. But even the basic names are confusing, such as the Wikimedia/Wikipedia similarity.

With that leading introduction ;-), I’d like to invite you to complete the Wikimedia brand survey if you are interested in such matters & find the time.

Referencing source code

Following up: The Nethack Wikia is actually using Wikipedia’s referencing extension for directly referencing lines of code in the NetHack source. It’s a pretty great wiki, too.


The discovery of Gliese 581 c is a watershed moment in the search for extrasolar planets and alien life. What folly to view religion as revelation, when it is science that is unwrapping the universe like a giant birthday present, making visible entire worlds one by one, in the unimaginably vast candy store of billions of observable galaxies. One of the most promising missions among the planet hunters is COROT, a space telescope operated by the French and European Space Agencies. And, of course, when I wanted to see what the state of that mission is, I intuitively looked it up on Wikipedia.

Purely by coincidence, COROT has found its first planet yesterday. Not only was this noted in the Wikipedia article about COROT, the planet itself already has an entry of its own. Thus, I did not learn about the discovery through the numerous RSS feeds and news websites I follow (including Wikinews), but through Wikipedia. We call Wikipedia an encyclopedia — but it is clearly much more than any encyclopedia history has ever seen.

I am hardly the first person to notice this, and indeed, the New York Times recently devoted an article to exploring Wikipedia’s coverage of the Virginia Tech massacre. How can one make more intelligent use of the news-like characeristics of Wikipedia and combine them in meaningful ways with our news-dedicated project, Wikinews?

I’ve personally subscribed to the history RSS feeds of a number of articles of interest (access them from the “history” page of the article in the bottom left corner). These give you diffs to the latest changes to the article, which can be useful in order to, say, notice that one of your favorite bands has released a new album. But of course you will get a lot of crud, including vandalism and boring maintenance edits. There are simple ways to make feeds smarter — only pushing changes into the feed when an article has stabilized, filtering minor edits, etc.

Structured data will also allow for some interesting feed possibilities: if an album is an object associated with a band, then it is possible to be notified if there are specific changes to the existing objects or additions of new ones. This general principle can be applied wonderfully broadly, turning any wiki into a universal event notification mechanism. (Alert me when person X dies / a conference of type Y happens / an astronomical object with the characteristics A, B, and C is discovered.) Wikipedia (and its structured data repository) will be the single most useful one, but specialized wikis will of course thrive and benefit from the same technology.

In the department of less remote possibilities, I’ve described an RSS extension I’d like to see back in February. It would allow the transformation of portals into mini-news sites linking directly to relevant Wikipedia articles. In general, the more ways we have to publish RSS automatically or semi-automatically, the better–the community will innovate around the technology.

Our separate Wikinews project remains justifiable as a differently styled, more detailed and granular view of events of the day largely irrespective of their historical significance. But I believe we should try to make the two projects interact more neatly when it comes to major events. Cross-wiki content transclusion in combination with the ever-elusive single user login might spur some interesting collaborations, particularly about components that are useful to both projects (timelines, infoboxes, and so on). Perhaps even the idea of shared fair use content is not entirely blasphemous in this context.

The increasing use of Wikipedia as a news source in its own right will only strengthen its cultural significance in ways that we have yet to understand.

Is Wikipedia complete?

Sage Ross reports in the latest Wikipedia Signpost about an interesting experiment at George Mason University where history students were asked to write articles about a subject not already covered in the English Wikipedia. It is interesting to read the course blog for the students’ impression of Wikipedia. (The talk page of the signpost article lists some of the articles they created.)

There are many observations one can make about this experiment, but I want to focus on just one. Many of the students had great trouble finding a topic to write about that is not already covered by Wikipedia. Those who did sometimes did not realize that an article about their topic existed under a different title (or chose to ignore it, wanting to provide instead “their own perspective”). This was fascinating to me, given that I believe this should have been the easiest part of their assignment. Granted, it was complicated by the fact that the students had to create a new article. But let’s think a little about the common notion that the English Wikipedia is “basically complete”.

Wikipedia provides anyone with plenty of guidance on what to write about. There is, of course, the gigantic directory of
requested articles, which is growing faster than old requests are being fulfilled. Moreover, even when browsing any Wikipedia article about history, you will notice the occasional red link. Their frequency increases as you go past the history of North America and Europe. Beyond history, there are countless specialized pages waiting to be written — articles about species, geographical entities, astronomical objects, and so forth. But here, we are still only talking about horizontal growth. The perfect Wikipedia article allows near unlimited exploration and is supported by rich media, source text, news, references, structured data .. and every single article that currently exists can be improved in this regard. Only a very tiny fraction of articles has reached our current “featured article” standard. This standard and its interpretation have changed significantly over time.

In fact, perhaps the “perfect” article cannot exist, as our conception of knowledge is constantly changing. Here are just some expectations that I think we will have of future articles, in rough order of appearance:

  1. Structured data. If we deploy technology like Semantic MediaWiki or OmegaWiki, we will have to rethink the ways in which we deal with structured data such as the information in most infoboxes. Much of the data currently in human- or bot-maintained lists will be automatically obtained from the structured data embedded into or associated with articles. As existing scientific databases are wikified, these too will become connected with our own content, and it will become possible to navigate directly to the latest scientific results as they are being collected. Of course, even simple structured data functionality poses very serious scalability issues, and we will likely see these efforts evolve separately from the main Wikipedia content for a while. But as the technology matures, the need for integration will increase — and Wikipedians will be expected to hunt for as many sources of data as possible to enrich any given article.
  2. More free content. Vast archives of materials are waiting to be liberated from copyright restrictions, and any single source can add great value. Aside from any massive philanthropic content liberation campaigns and the advances of the open access movement, I hope and believe that reform of the incredibly unbalanced international system of copyright law is possible. Even shaving as much as 30 years off current copyright terms would unlock decades of cultural wealth. Lastly, Wikipedia’s own influence continues to grow, and the importance of having content in Wikipedia may often outweigh any arguments against free content licensing.
  3. Deep sourcing. I have already explored this notion here: Whether we are writing about games, software, or videos, I expect that our models of referencing will require radical innovation to reference deep segments of the content. The best reference is one which allows me to go directly to the relevant piece of code, text, sound or video — but that will of course only be possible for transparent, open access resources.
  4. Levels of knowledge. We have different levels of detail within each Wikipedia, but the current Wikipedias are essentially written for intelligent, educated readers. We should have materials for different reading levels, and summaries of complex subjects written for readers with little pre-existing knowledge. Simple English and Wikijunior are first attempts to make this happen, but we should have a more abstract perspective on how to best represent these different levels of knowledge throughout projects and languages.
  5. Less language-centric views. Right now, references tend to be to works in the language of the respective Wikipedia. However, even following the interwiki links, one can often discover sources in other languages on the same topic, which may very well be much richer and more useful. As our cross-language communication tools improve, our expectation will be to present the views of more than one language space on a given topic. Breakthroughs in freely available machine translation tools could have a massive transformational impact, but even a less ambitious project like Wikicat and the associated ideas could revolutionize the way we look at sources.
  6. More data types. We are very image-rich, but still have few other media. Virtually every article can be served by video content, be it clips from a documentary or an actual recording of the subject. Even original documentary material made through wiki collaboration is a possibility. As for sounds, every musical instrument, every animal that makes sounds, every politician or activist, should have sound files associated with their article.

    In terms of images and tables, their prevalence and quality will increase further as we deploy new extensions such as WikiTeX, which are essentially integrated authoring tools for specialized content such as chessboard patterns, relational diagrams, or music scores. We can and do support all this content already, but the easier it becomes to create it, the more widely it will be used. (And, of course, syntax-driven authoring is hardly the peak of usability.) One particular killer application could result from more intelligent generation of SVG images using text parameters. This is not trivial (the text needs to be rendered within a given “hot spot area” of the image), but not impossible either.

  7. “Sociality”. Presently we only encourage community building for the explicit purpose of creating reference works. Wikiversity is a notable exception with the desire to form learning communities. But why should it not be possible for me to connect easily with students doing their thesis on a particular Wikipedia topic, or researchers who specialize on it? The existing WikiProjects, portals and IRC channels are also seeds for interest communities around particular topics. I believe it is inevitable that these seeds will grow into broader discussion and research areas, partially as part of project convergence. We should stop being afraid of such communities of interest–a community of interest that is strongly connected to Wikipedia may very well be preferable to one which is not, even if it is about Pokemon.
  8. Project convergence. Our current sister project templates are dumb, dead links. Imagine being able to navigate the annotated text of a book from Wikisource directly from within the related Wikipedia article, seeing a sidebar with the latest Wikinews stories on a given topic, scrollable galleries from Commons, or quiz questions from Wikiversity. One should frown upon buzzwords like “web 2.0” or “mash-ups”, but some of the underlying ideas are worth exploring. One of my favorite UI paradigms that is enabled by AJAX is the infinite loader. The loveliest example of this is Google Reader, which allows you to scroll through the archives of any news source, until it runs out of data, without ever reloading a page. We need similar boundless knowledge exploration tools. As we build them, and integrate our projects in other ways, the distinction between the different “Wiki-somethings” will blur, and the expectation for quality content from our sister projects will increase.
  9. Simple interactive content. There’s not really much that is stopping us from integrating the countless open source Java- and Flash-based learning applets that are out there into Wikipedia, except for free-as-in-freedom and security issues. At least Java should be “open enough” soon, and Flash might get a decent open source implementation. As for security review, I believe that open source, combined with a simple trust model and a healthy dose of “assume good faith” will be sufficient.
  10. Machinima. A type of video, machinimations are relatively easy to create 3D films. They are typically made using the movie-recording capabilities of computer games. Their quality is driven by the multimedia capabilities of PCs and game consoles, and the games implemented for them. Games are a multi-billion-dollar industry that may eventually eclipse even moviemaking, so continued innovation is inevitable. Machinima can be used to re-enact any sequence of events using cutting edge 3D graphics. A military simulation with good machinima capabilities may very well lead to the first massive use of this technology to enrich Wikipedia articles about historical battles with amateur re-creations thereof.
  11. Interactive 3D content. Second Life is trying to become the “3D web” by making much of its technology available under open source conditions. Perhaps it will succeed, perhaps not. I expect that real mass adoption of 3D technology in an everyday context will only occur together with stereoscopic displays. “Virtual Reality” has become one of those technologies that, like video conferencing, has been predicted so frequently and imagined in so much detail without significant mass use beyond gaming that many people have stopped believing in it — but eventually, 3D navigation may become the standard method by which most of us access content of any type. As is so often the case, this change is gradual, and the new 3D capabilities of both the Linux and the Windows desktop are first humble steps in this direction.

    Most imagined 3D user interfaces have focused on simple metaphors such as “avatars”, buildings, “flying”, and so on. I expect that 3D interfaces will draw from these metaphors, but they will be governed by user needs for efficient ways to locate content, places, and things. (At least within the open source culture, technology tends to be driven by user needs, not by a top down hype machine.) Sometimes those tools will be visual, sometimes verbal, sometimes social. So I’m not convinced that we will access all Wikipedia content through intelligent avatars who answer questions using speech recognition and artificial intelligence. :-)

    In the end, the narratives of these 3D worlds may end up being more dream-like than reality-like in their chaotic structure and convergence of sensory stimuli. But I do believe that users will want to participate in interactive, social learning environments (bringing the experience of a well-designed museum exhibit to the Internet as a whole), and that these will blend with purely textual explanations.

  12. Intelligent learning systems. We know that people learn with different efficacy under different conditions, but unfortunately, things aren’t very simple beyond that — no single model of learning styles has found strong empirical support. I believe that computer-facilitated learning can theoretically adapt as well to the complexity of a human neural network as human-mediated learning, if not more so. An ILS would likely rely on a vast database of information for any single learner, a database that would have to keep track of much of their activities online. (This is not necessarily a privacy issue if the database is stored locally and encrypted.) Moreover, it would have to tap into participatory activities and teacher assessments. Therefore, I expect that advanced systems of this nature are still quite a remote possibility. But if they can be built, I think they will radically alter the way we learn, and impose new requirements on the content of any learning resource.

These are just some developments that are (somewhat) predictable with our current technological horizon. We have no idea how knowledge might be transformed by new communication tools, nanotech, artificial intelligence, neural interfaces, or anything else we may dream up. But even within the limits of today’s tech, the notion that Wikipedia is “finished” in any meaningful way is very alien to me.