Is Wikipedia complete?

Sage Ross reports in the latest Wikipedia Signpost about an interesting experiment at George Mason University where history students were asked to write articles about a subject not already covered in the English Wikipedia. It is interesting to read the course blog for the students’ impression of Wikipedia. (The talk page of the signpost article lists some of the articles they created.)

There are many observations one can make about this experiment, but I want to focus on just one. Many of the students had great trouble finding a topic to write about that is not already covered by Wikipedia. Those who did sometimes did not realize that an article about their topic existed under a different title (or chose to ignore it, wanting to provide instead “their own perspective”). This was fascinating to me, given that I believe this should have been the easiest part of their assignment. Granted, it was complicated by the fact that the students had to create a new article. But let’s think a little about the common notion that the English Wikipedia is “basically complete”.

Wikipedia provides anyone with plenty of guidance on what to write about. There is, of course, the gigantic directory of
requested articles, which is growing faster than old requests are being fulfilled. Moreover, even when browsing any Wikipedia article about history, you will notice the occasional red link. Their frequency increases as you go past the history of North America and Europe. Beyond history, there are countless specialized pages waiting to be written — articles about species, geographical entities, astronomical objects, and so forth. But here, we are still only talking about horizontal growth. The perfect Wikipedia article allows near unlimited exploration and is supported by rich media, source text, news, references, structured data .. and every single article that currently exists can be improved in this regard. Only a very tiny fraction of articles has reached our current “featured article” standard. This standard and its interpretation have changed significantly over time.

In fact, perhaps the “perfect” article cannot exist, as our conception of knowledge is constantly changing. Here are just some expectations that I think we will have of future articles, in rough order of appearance:

  1. Structured data. If we deploy technology like Semantic MediaWiki or OmegaWiki, we will have to rethink the ways in which we deal with structured data such as the information in most infoboxes. Much of the data currently in human- or bot-maintained lists will be automatically obtained from the structured data embedded into or associated with articles. As existing scientific databases are wikified, these too will become connected with our own content, and it will become possible to navigate directly to the latest scientific results as they are being collected. Of course, even simple structured data functionality poses very serious scalability issues, and we will likely see these efforts evolve separately from the main Wikipedia content for a while. But as the technology matures, the need for integration will increase — and Wikipedians will be expected to hunt for as many sources of data as possible to enrich any given article.
  2. More free content. Vast archives of materials are waiting to be liberated from copyright restrictions, and any single source can add great value. Aside from any massive philanthropic content liberation campaigns and the advances of the open access movement, I hope and believe that reform of the incredibly unbalanced international system of copyright law is possible. Even shaving as much as 30 years off current copyright terms would unlock decades of cultural wealth. Lastly, Wikipedia’s own influence continues to grow, and the importance of having content in Wikipedia may often outweigh any arguments against free content licensing.
  3. Deep sourcing. I have already explored this notion here: Whether we are writing about games, software, or videos, I expect that our models of referencing will require radical innovation to reference deep segments of the content. The best reference is one which allows me to go directly to the relevant piece of code, text, sound or video — but that will of course only be possible for transparent, open access resources.
  4. Levels of knowledge. We have different levels of detail within each Wikipedia, but the current Wikipedias are essentially written for intelligent, educated readers. We should have materials for different reading levels, and summaries of complex subjects written for readers with little pre-existing knowledge. Simple English and Wikijunior are first attempts to make this happen, but we should have a more abstract perspective on how to best represent these different levels of knowledge throughout projects and languages.
  5. Less language-centric views. Right now, references tend to be to works in the language of the respective Wikipedia. However, even following the interwiki links, one can often discover sources in other languages on the same topic, which may very well be much richer and more useful. As our cross-language communication tools improve, our expectation will be to present the views of more than one language space on a given topic. Breakthroughs in freely available machine translation tools could have a massive transformational impact, but even a less ambitious project like Wikicat and the associated ideas could revolutionize the way we look at sources.
  6. More data types. We are very image-rich, but still have few other media. Virtually every article can be served by video content, be it clips from a documentary or an actual recording of the subject. Even original documentary material made through wiki collaboration is a possibility. As for sounds, every musical instrument, every animal that makes sounds, every politician or activist, should have sound files associated with their article.

    In terms of images and tables, their prevalence and quality will increase further as we deploy new extensions such as WikiTeX, which are essentially integrated authoring tools for specialized content such as chessboard patterns, relational diagrams, or music scores. We can and do support all this content already, but the easier it becomes to create it, the more widely it will be used. (And, of course, syntax-driven authoring is hardly the peak of usability.) One particular killer application could result from more intelligent generation of SVG images using text parameters. This is not trivial (the text needs to be rendered within a given “hot spot area” of the image), but not impossible either.

  7. “Sociality”. Presently we only encourage community building for the explicit purpose of creating reference works. Wikiversity is a notable exception with the desire to form learning communities. But why should it not be possible for me to connect easily with students doing their thesis on a particular Wikipedia topic, or researchers who specialize on it? The existing WikiProjects, portals and IRC channels are also seeds for interest communities around particular topics. I believe it is inevitable that these seeds will grow into broader discussion and research areas, partially as part of project convergence. We should stop being afraid of such communities of interest–a community of interest that is strongly connected to Wikipedia may very well be preferable to one which is not, even if it is about Pokemon.
  8. Project convergence. Our current sister project templates are dumb, dead links. Imagine being able to navigate the annotated text of a book from Wikisource directly from within the related Wikipedia article, seeing a sidebar with the latest Wikinews stories on a given topic, scrollable galleries from Commons, or quiz questions from Wikiversity. One should frown upon buzzwords like “web 2.0” or “mash-ups”, but some of the underlying ideas are worth exploring. One of my favorite UI paradigms that is enabled by AJAX is the infinite loader. The loveliest example of this is Google Reader, which allows you to scroll through the archives of any news source, until it runs out of data, without ever reloading a page. We need similar boundless knowledge exploration tools. As we build them, and integrate our projects in other ways, the distinction between the different “Wiki-somethings” will blur, and the expectation for quality content from our sister projects will increase.
  9. Simple interactive content. There’s not really much that is stopping us from integrating the countless open source Java- and Flash-based learning applets that are out there into Wikipedia, except for free-as-in-freedom and security issues. At least Java should be “open enough” soon, and Flash might get a decent open source implementation. As for security review, I believe that open source, combined with a simple trust model and a healthy dose of “assume good faith” will be sufficient.
  10. Machinima. A type of video, machinimations are relatively easy to create 3D films. They are typically made using the movie-recording capabilities of computer games. Their quality is driven by the multimedia capabilities of PCs and game consoles, and the games implemented for them. Games are a multi-billion-dollar industry that may eventually eclipse even moviemaking, so continued innovation is inevitable. Machinima can be used to re-enact any sequence of events using cutting edge 3D graphics. A military simulation with good machinima capabilities may very well lead to the first massive use of this technology to enrich Wikipedia articles about historical battles with amateur re-creations thereof.
  11. Interactive 3D content. Second Life is trying to become the “3D web” by making much of its technology available under open source conditions. Perhaps it will succeed, perhaps not. I expect that real mass adoption of 3D technology in an everyday context will only occur together with stereoscopic displays. “Virtual Reality” has become one of those technologies that, like video conferencing, has been predicted so frequently and imagined in so much detail without significant mass use beyond gaming that many people have stopped believing in it — but eventually, 3D navigation may become the standard method by which most of us access content of any type. As is so often the case, this change is gradual, and the new 3D capabilities of both the Linux and the Windows desktop are first humble steps in this direction.

    Most imagined 3D user interfaces have focused on simple metaphors such as “avatars”, buildings, “flying”, and so on. I expect that 3D interfaces will draw from these metaphors, but they will be governed by user needs for efficient ways to locate content, places, and things. (At least within the open source culture, technology tends to be driven by user needs, not by a top down hype machine.) Sometimes those tools will be visual, sometimes verbal, sometimes social. So I’m not convinced that we will access all Wikipedia content through intelligent avatars who answer questions using speech recognition and artificial intelligence. :-)

    In the end, the narratives of these 3D worlds may end up being more dream-like than reality-like in their chaotic structure and convergence of sensory stimuli. But I do believe that users will want to participate in interactive, social learning environments (bringing the experience of a well-designed museum exhibit to the Internet as a whole), and that these will blend with purely textual explanations.

  12. Intelligent learning systems. We know that people learn with different efficacy under different conditions, but unfortunately, things aren’t very simple beyond that — no single model of learning styles has found strong empirical support. I believe that computer-facilitated learning can theoretically adapt as well to the complexity of a human neural network as human-mediated learning, if not more so. An ILS would likely rely on a vast database of information for any single learner, a database that would have to keep track of much of their activities online. (This is not necessarily a privacy issue if the database is stored locally and encrypted.) Moreover, it would have to tap into participatory activities and teacher assessments. Therefore, I expect that advanced systems of this nature are still quite a remote possibility. But if they can be built, I think they will radically alter the way we learn, and impose new requirements on the content of any learning resource.

These are just some developments that are (somewhat) predictable with our current technological horizon. We have no idea how knowledge might be transformed by new communication tools, nanotech, artificial intelligence, neural interfaces, or anything else we may dream up. But even within the limits of today’s tech, the notion that Wikipedia is “finished” in any meaningful way is very alien to me.

1 Comment

  1. Just a minor note: Requested articles really isn’t all that useful. For this sort of university project, the Missing Encyclopedic Articles project is much more useful.

Leave a Reply

Your email address will not be published.

*