AuthorErik

Referencing source code

Following up: The Nethack Wikia is actually using Wikipedia’s referencing extension for directly referencing lines of code in the NetHack source. It’s a pretty great wiki, too.

Wikipedianews

The discovery of Gliese 581 c is a watershed moment in the search for extrasolar planets and alien life. What folly to view religion as revelation, when it is science that is unwrapping the universe like a giant birthday present, making visible entire worlds one by one, in the unimaginably vast candy store of billions of observable galaxies. One of the most promising missions among the planet hunters is COROT, a space telescope operated by the French and European Space Agencies. And, of course, when I wanted to see what the state of that mission is, I intuitively looked it up on Wikipedia.

Purely by coincidence, COROT has found its first planet yesterday. Not only was this noted in the Wikipedia article about COROT, the planet itself already has an entry of its own. Thus, I did not learn about the discovery through the numerous RSS feeds and news websites I follow (including Wikinews), but through Wikipedia. We call Wikipedia an encyclopedia — but it is clearly much more than any encyclopedia history has ever seen.

I am hardly the first person to notice this, and indeed, the New York Times recently devoted an article to exploring Wikipedia’s coverage of the Virginia Tech massacre. How can one make more intelligent use of the news-like characeristics of Wikipedia and combine them in meaningful ways with our news-dedicated project, Wikinews?

I’ve personally subscribed to the history RSS feeds of a number of articles of interest (access them from the “history” page of the article in the bottom left corner). These give you diffs to the latest changes to the article, which can be useful in order to, say, notice that one of your favorite bands has released a new album. But of course you will get a lot of crud, including vandalism and boring maintenance edits. There are simple ways to make feeds smarter — only pushing changes into the feed when an article has stabilized, filtering minor edits, etc.

Structured data will also allow for some interesting feed possibilities: if an album is an object associated with a band, then it is possible to be notified if there are specific changes to the existing objects or additions of new ones. This general principle can be applied wonderfully broadly, turning any wiki into a universal event notification mechanism. (Alert me when person X dies / a conference of type Y happens / an astronomical object with the characteristics A, B, and C is discovered.) Wikipedia (and its structured data repository) will be the single most useful one, but specialized wikis will of course thrive and benefit from the same technology.

In the department of less remote possibilities, I’ve described an RSS extension I’d like to see back in February. It would allow the transformation of portals into mini-news sites linking directly to relevant Wikipedia articles. In general, the more ways we have to publish RSS automatically or semi-automatically, the better–the community will innovate around the technology.

Our separate Wikinews project remains justifiable as a differently styled, more detailed and granular view of events of the day largely irrespective of their historical significance. But I believe we should try to make the two projects interact more neatly when it comes to major events. Cross-wiki content transclusion in combination with the ever-elusive single user login might spur some interesting collaborations, particularly about components that are useful to both projects (timelines, infoboxes, and so on). Perhaps even the idea of shared fair use content is not entirely blasphemous in this context.

The increasing use of Wikipedia as a news source in its own right will only strengthen its cultural significance in ways that we have yet to understand.

Is Wikipedia complete?

Sage Ross reports in the latest Wikipedia Signpost about an interesting experiment at George Mason University where history students were asked to write articles about a subject not already covered in the English Wikipedia. It is interesting to read the course blog for the students’ impression of Wikipedia. (The talk page of the signpost article lists some of the articles they created.)

There are many observations one can make about this experiment, but I want to focus on just one. Many of the students had great trouble finding a topic to write about that is not already covered by Wikipedia. Those who did sometimes did not realize that an article about their topic existed under a different title (or chose to ignore it, wanting to provide instead “their own perspective”). This was fascinating to me, given that I believe this should have been the easiest part of their assignment. Granted, it was complicated by the fact that the students had to create a new article. But let’s think a little about the common notion that the English Wikipedia is “basically complete”.

Wikipedia provides anyone with plenty of guidance on what to write about. There is, of course, the gigantic directory of
requested articles, which is growing faster than old requests are being fulfilled. Moreover, even when browsing any Wikipedia article about history, you will notice the occasional red link. Their frequency increases as you go past the history of North America and Europe. Beyond history, there are countless specialized pages waiting to be written — articles about species, geographical entities, astronomical objects, and so forth. But here, we are still only talking about horizontal growth. The perfect Wikipedia article allows near unlimited exploration and is supported by rich media, source text, news, references, structured data .. and every single article that currently exists can be improved in this regard. Only a very tiny fraction of articles has reached our current “featured article” standard. This standard and its interpretation have changed significantly over time.

In fact, perhaps the “perfect” article cannot exist, as our conception of knowledge is constantly changing. Here are just some expectations that I think we will have of future articles, in rough order of appearance:

  1. Structured data. If we deploy technology like Semantic MediaWiki or OmegaWiki, we will have to rethink the ways in which we deal with structured data such as the information in most infoboxes. Much of the data currently in human- or bot-maintained lists will be automatically obtained from the structured data embedded into or associated with articles. As existing scientific databases are wikified, these too will become connected with our own content, and it will become possible to navigate directly to the latest scientific results as they are being collected. Of course, even simple structured data functionality poses very serious scalability issues, and we will likely see these efforts evolve separately from the main Wikipedia content for a while. But as the technology matures, the need for integration will increase — and Wikipedians will be expected to hunt for as many sources of data as possible to enrich any given article.
  2. More free content. Vast archives of materials are waiting to be liberated from copyright restrictions, and any single source can add great value. Aside from any massive philanthropic content liberation campaigns and the advances of the open access movement, I hope and believe that reform of the incredibly unbalanced international system of copyright law is possible. Even shaving as much as 30 years off current copyright terms would unlock decades of cultural wealth. Lastly, Wikipedia’s own influence continues to grow, and the importance of having content in Wikipedia may often outweigh any arguments against free content licensing.
  3. Deep sourcing. I have already explored this notion here: Whether we are writing about games, software, or videos, I expect that our models of referencing will require radical innovation to reference deep segments of the content. The best reference is one which allows me to go directly to the relevant piece of code, text, sound or video — but that will of course only be possible for transparent, open access resources.
  4. Levels of knowledge. We have different levels of detail within each Wikipedia, but the current Wikipedias are essentially written for intelligent, educated readers. We should have materials for different reading levels, and summaries of complex subjects written for readers with little pre-existing knowledge. Simple English and Wikijunior are first attempts to make this happen, but we should have a more abstract perspective on how to best represent these different levels of knowledge throughout projects and languages.
  5. Less language-centric views. Right now, references tend to be to works in the language of the respective Wikipedia. However, even following the interwiki links, one can often discover sources in other languages on the same topic, which may very well be much richer and more useful. As our cross-language communication tools improve, our expectation will be to present the views of more than one language space on a given topic. Breakthroughs in freely available machine translation tools could have a massive transformational impact, but even a less ambitious project like Wikicat and the associated ideas could revolutionize the way we look at sources.
  6. More data types. We are very image-rich, but still have few other media. Virtually every article can be served by video content, be it clips from a documentary or an actual recording of the subject. Even original documentary material made through wiki collaboration is a possibility. As for sounds, every musical instrument, every animal that makes sounds, every politician or activist, should have sound files associated with their article.

    In terms of images and tables, their prevalence and quality will increase further as we deploy new extensions such as WikiTeX, which are essentially integrated authoring tools for specialized content such as chessboard patterns, relational diagrams, or music scores. We can and do support all this content already, but the easier it becomes to create it, the more widely it will be used. (And, of course, syntax-driven authoring is hardly the peak of usability.) One particular killer application could result from more intelligent generation of SVG images using text parameters. This is not trivial (the text needs to be rendered within a given “hot spot area” of the image), but not impossible either.

  7. “Sociality”. Presently we only encourage community building for the explicit purpose of creating reference works. Wikiversity is a notable exception with the desire to form learning communities. But why should it not be possible for me to connect easily with students doing their thesis on a particular Wikipedia topic, or researchers who specialize on it? The existing WikiProjects, portals and IRC channels are also seeds for interest communities around particular topics. I believe it is inevitable that these seeds will grow into broader discussion and research areas, partially as part of project convergence. We should stop being afraid of such communities of interest–a community of interest that is strongly connected to Wikipedia may very well be preferable to one which is not, even if it is about Pokemon.
  8. Project convergence. Our current sister project templates are dumb, dead links. Imagine being able to navigate the annotated text of a book from Wikisource directly from within the related Wikipedia article, seeing a sidebar with the latest Wikinews stories on a given topic, scrollable galleries from Commons, or quiz questions from Wikiversity. One should frown upon buzzwords like “web 2.0” or “mash-ups”, but some of the underlying ideas are worth exploring. One of my favorite UI paradigms that is enabled by AJAX is the infinite loader. The loveliest example of this is Google Reader, which allows you to scroll through the archives of any news source, until it runs out of data, without ever reloading a page. We need similar boundless knowledge exploration tools. As we build them, and integrate our projects in other ways, the distinction between the different “Wiki-somethings” will blur, and the expectation for quality content from our sister projects will increase.
  9. Simple interactive content. There’s not really much that is stopping us from integrating the countless open source Java- and Flash-based learning applets that are out there into Wikipedia, except for free-as-in-freedom and security issues. At least Java should be “open enough” soon, and Flash might get a decent open source implementation. As for security review, I believe that open source, combined with a simple trust model and a healthy dose of “assume good faith” will be sufficient.
  10. Machinima. A type of video, machinimations are relatively easy to create 3D films. They are typically made using the movie-recording capabilities of computer games. Their quality is driven by the multimedia capabilities of PCs and game consoles, and the games implemented for them. Games are a multi-billion-dollar industry that may eventually eclipse even moviemaking, so continued innovation is inevitable. Machinima can be used to re-enact any sequence of events using cutting edge 3D graphics. A military simulation with good machinima capabilities may very well lead to the first massive use of this technology to enrich Wikipedia articles about historical battles with amateur re-creations thereof.
  11. Interactive 3D content. Second Life is trying to become the “3D web” by making much of its technology available under open source conditions. Perhaps it will succeed, perhaps not. I expect that real mass adoption of 3D technology in an everyday context will only occur together with stereoscopic displays. “Virtual Reality” has become one of those technologies that, like video conferencing, has been predicted so frequently and imagined in so much detail without significant mass use beyond gaming that many people have stopped believing in it — but eventually, 3D navigation may become the standard method by which most of us access content of any type. As is so often the case, this change is gradual, and the new 3D capabilities of both the Linux and the Windows desktop are first humble steps in this direction.

    Most imagined 3D user interfaces have focused on simple metaphors such as “avatars”, buildings, “flying”, and so on. I expect that 3D interfaces will draw from these metaphors, but they will be governed by user needs for efficient ways to locate content, places, and things. (At least within the open source culture, technology tends to be driven by user needs, not by a top down hype machine.) Sometimes those tools will be visual, sometimes verbal, sometimes social. So I’m not convinced that we will access all Wikipedia content through intelligent avatars who answer questions using speech recognition and artificial intelligence. :-)

    In the end, the narratives of these 3D worlds may end up being more dream-like than reality-like in their chaotic structure and convergence of sensory stimuli. But I do believe that users will want to participate in interactive, social learning environments (bringing the experience of a well-designed museum exhibit to the Internet as a whole), and that these will blend with purely textual explanations.

  12. Intelligent learning systems. We know that people learn with different efficacy under different conditions, but unfortunately, things aren’t very simple beyond that — no single model of learning styles has found strong empirical support. I believe that computer-facilitated learning can theoretically adapt as well to the complexity of a human neural network as human-mediated learning, if not more so. An ILS would likely rely on a vast database of information for any single learner, a database that would have to keep track of much of their activities online. (This is not necessarily a privacy issue if the database is stored locally and encrypted.) Moreover, it would have to tap into participatory activities and teacher assessments. Therefore, I expect that advanced systems of this nature are still quite a remote possibility. But if they can be built, I think they will radically alter the way we learn, and impose new requirements on the content of any learning resource.

These are just some developments that are (somewhat) predictable with our current technological horizon. We have no idea how knowledge might be transformed by new communication tools, nanotech, artificial intelligence, neural interfaces, or anything else we may dream up. But even within the limits of today’s tech, the notion that Wikipedia is “finished” in any meaningful way is very alien to me.

Semi-automated deletion nomination at Commons

I just saw that the JavaScript wizards at Wikimedia Commons came up
with an impressive new tool – if you look at an image description page, you will now see “Nominate for deletion” link in the bottom
right corner. If you follow that link and give a deletion reason,
everything – the tagging of the image, the listing on the Deletion
requests page, and the notification of the uploader – is done
automatically using JavaScript.

This is quite impressive. I’d love to see more of this kind of
automation enabled by default, at least for users in the
“autoconfirmed” group (registered accounts older than X days). Think about it:

  • Deletion, peer review, featured article status nominations
  • Speedy deletion with auto-notification to the affected users
  • updating news pages & portals with important announcements

I’m sure there are countless scenarios where this might come in handy.
I can see the dangers, but I think the benefits justify some more
experiments. Any takers & other examples of similar semi-automated
tools?

Citizendium is not Free Content

Nearly a month after its public launch, Citizendium (a new wiki-like encyclopedia that positions itself against Wikipedia) still has not figured out its licensing policy. While the project has no choice but to follow Wikipedia’s GNU FDL when it imports articles from there, its own pages are still under undefined terms. Contributors are asked to wait while (someone) figures out the licensing terms: “All new articles will be available under an open content license yet to be determined.” It does not say who will make that determination and on what legal basis they even have a right to do so. Unless CZ decides to ask every contributor for permission to relicense, authors would have a very good claim to question a licensing decision they do not agree with.

I hope that Citizendium will become free content eventually, instead of adopting odious restrictions like “no commercial use” which would make subsets of it incompatible with Wikipedia and other free knowledge resources, not to mention making an awful mess of the editing process. Meanwhile, the content created by CZ contributors is completely proprietary and not usable by anyone beyond its publication to the CZ website. The few high quality articles they have developed and which could potentially be merged back into Wikipedia are non-free. I would caution anyone who contributes there to at least explicitly license their content (for example, by putting a licensing template on their user page).

An Adventurer is You!

I love it when I discover utterly bizarre and wonderfully unique worlds on the Internet previously unknown to me. And of course, as is so often the case, I did so when browsing Wikipedia. Specifically, in the article about NetHack which I check occasionally for updates to this deeply fascinating comptuer game classic, I found a reference to another game called Kingdom of Loathing I head never heard of. The article is pretty informative, though I think the intro written by the game’s designers gives you a better feel for the game. 😉

Essentially, KoL is a browser-based role playing game and online community, but instead of fighting giant rats or hordes of the undead, your enemies are sabre-toothed limes, ninja snowmen, and fluffy rabbits. Items (“filthy corduroys”) and character classes (“disco bandits”) are equally bizarre. But what is the most surprising (and perhaps concerning) is the number of active players. The KoL community is huge, with more than 2 million messages posted to the game’s forums, and thousands of players logged in at a time. The community is further sustained by large fan websites, “clans”, and frequent real-life meetings. The game is financially supported by donations and merchandise.

What makes it, to me, more fascinating than other similar browser-based game communities is the incredible level of surrealism and satire. It is in some ways a complete abstraction of certain RPG core principles like quests, skills, levels, magic, all these elements being replaced by jokes and nonsense. The visuals are literally doodles and stick figures, and interactivity is limited by the minimal browser interface. Still, in spite of the lack of an environment that could possibly be immersive without additional drug use, all the core RPG mechanisms seem to be as addictive as ever to its user base (though I would imagine that the humor also helps).

Within Wikipedia, factions often dispute the usefulness of articles about “non-notable” web phenomena like this one, because they tend to not receive significant coverage outside the web’s micromedia. I’m glad Wikipedia has an article about KoL, especially because no other place would provide me with a neutral, comprehensive summary of such a bizarre subculture. Indeed, I hope that the wikisphere will encourage and drive original research into these topics — not in Wikipedia itself, but in other spaces like Wikiversity and Wikinews. Even within Wikipedia, I hope the bias against using primary sources in documenting projects like KoL will decrease. Indeed, as I mentioned previously, I think wikis have the potential to take referencing to new levels.

And why, you might ask, is it even important to understand such an obscure, silly phenomenon? Why is it important to understand gaming culture, furries, or TV fandom? Should we not dismiss such embarrassing cultural idiocy, and lead humanity towards a golden age of a new enlightenment? I believe in the latter, but not in the former. If we want to advance as a species, we must understand what makes us tick. We must develop models that help us to explain why people form online communities around the idea of hunting menacing citrus fruits. If we can accurately predict these motivations and their underlying patterns, we can make use of this knowledge to build sustainable communities dedicated to human progress. Should, for example, a project like Wikipedia make use of RPG-like mechanisms to build motivation for routine tasks? Probably not, but right now we are stumbling in the dark when it comes to predicting the effects particular mechanisms might have, because we have no empirically sound framework to place them in. We can use trial and error, but the more errors we make, the harder it gets to justify more trials.

Information gathered about a project like KoL should eventually be part of a massive database with an overlaid ontology which allows us to compare it to similar communities (online and offline), analyze growth patterns, see relevant case studies of conflicts and procedures, and so on. Psychologists, economists, sociologists, historians, neurobiologists, and researchers from many other disciplines ought to work together in developing unified models we need to engineer the rules and structures of networked communities systematically towards certain ends. That will require science itself to mesh into a networked community, independent of institutions and disciplines. We see the early beginnings of this in the wikisphere, but also in the open access movement with PLoS leading the way in web-based innovation of the scientific process. But there are still great challenges to overcome, ranging from proprietary licensing and closed data over institutional vanity and academic arrogance to short-sightedness in policies for communities like Wikipedia.

And that’s why you should care about ninja snowmen. :-)

Jetzt auch auf Deutsch

Für gelegentliche Posts in deutscher Sprache gibt es nun eine eigene Kategorie und einen dazugehörigen Feed.

Meine Lieblingsquelle über die deutsche Wikipedia gibt es allerdings leider nicht als Blog, den Wikipedia-Kurier.

Zotero & Wikipedia

… perfect together.

Zotero rocks. It should be part of the toolset of any serious Wikipedian.

Video game sequences as sources

People often make fun of Wikipedia’s obsession with pop culture. While it’s mildly amusing to look at the wiki bureaucracy surrounding, say, Pokémon related articles, this obsession has made Wikipedia a most unusual publication that takes a scholarly approach to topics that are not typically treated that way.

Currently, the article about the video game Devil May Cry is a candidate for featured article status. Looking at the article, one thing I had never noticed before is that dialogue from the game is used in footnotes to back up particular statements about the plot.

How does one reference a segment within an interactive resource, taking into account also the typically increasing difficulty and dependency on prior interactions in video games? This demonstrates the degree to which Wikipedians explore scholarly approaches even in areas of culture that are shunned by traditional academia (but which will without doubt be of great interest to the historians, sociologists and psychologists of the future). I suspect that the ideal solution for this would involve savegames or even memory dumps. Certainly, it is an easier problem to solve in the context of open source games, where the reference can go directly into the source code. I haven’t seen examples of that yet, though I would expect the NetHack article to eventually be full of such source code references, perhaps directly to the code line numbers in a particular version.

By permitting anonymous contributors and adopting an egalitarian editing model, Wikipedia has forever condemned itself to researching and documenting even the tiniest facts in its most obscure articles. Credentials count for nothing, and unreferenced statements can be removed. The implicit commitment to collaborative research by volunteers on an unprecedented scale will surely bring interesting new challenges and could redefine what it means to cite a source.

Wikimedia’s Open Source Toolset

There is a rarely explored relationship between the world of open source tools and free knowledge collaboration. This relationship developed naturally and quietly. Some open source tools have become quite essential to projects like Wikipedia, and I’ve started a page on our Meta-Wiki called Open Source Toolset to document the use of open source/free software tools in the Wikimedia Foundation projects. Others have quickly made useful additions (if you see any glaring omissions, please do not hesitate to edit).

Inkscape is an example of a mainstream open source tool that has become essential, even though it has not reached 1.0 yet. It has been used to create thousands of vector drawings in Wikimedia projects. But there are much more specialized tools, such as Hugin (used for stitching panorama pictures) or PP3 (used for celestial charts). The availability of these tools is incredibly empowering. Anyone with the necessary skills and interest can use them to immediately contribute their knowledge; there is no charge, and the quality of the software almost always increases over time.

The importance of this open source ecosystem of tools can hardly be overestimated. Every new tool, every new feature, directly feeds back into the quality of content that is being generated. Therefore, I strongly believe that we must find ways to support them. Google has an annual Summer of Code, through which it spends a lot of money on student projects. This is very worthwhile indeed. We do not have a lot of money, but we do have global website exposure. Perhaps the Wikimedia Foundation should support its own “Autumn of Collaboration”, providing learning resources and guiding volunteers to work on the projects that make the greatest difference in the collection and development of human knowledge.