The Value of Open Source Software in Wikimedia

Florence Devouard (Chair and the Wikimedia Foundation) and I disagree a bit about the value of open source software in the Wikimedia Foundation projects. Lately Florence has been taking a more “best tool for the job”, “don’t reinvent the wheel” approach, especially when it comes to tools we use internally, or as web services (a recent discussion was about survey tools). I don’t consider myself an ideological person — I discard beliefs as quickly as I adopt them if they aren’t useful. Maximizing open-source use internally and elsewhere simply strikes me as a best practice for a non-profit like the Wikimedia Foundation.

Let’s take the example of survey tools. For a user survey, you could use a number of web services, or you could build an open source extension to MediaWiki that is used for collecting information. If you use the former, you might get a deal with a company that lets you use their service for free, in return for the exposure that being “advertised” through its use on Wikipedia will give them. But consider that you might want to run a similar or different survey again the next year, to validate if certain trends (like gender participation) have been affected by your actions.

If you go with the proprietary software vendor, there’s a good chance that they will downgrade you to a regular customer status once they believe they’ve saturated the audience they can reach through you: no more free beer. If the company gets bought or goes bankrupt, you might not be able to work with them anymore at all. If you have specific usability complaints (say, because their survey uses JavaScript that only runs in Internet Explorer but not in Firefox), you’ll have to go through the usual support processes with third parties, and your request might not get processed at all. Depending on the nature of the deal, you also have to rely on their backup and privacy practices being sane.

As a vast online community dealing with all imaginable topics, Wikipedia has a huge number of detractors, including some deeply malicious or even mentally disturbed trolls. This means your software likely has to be more secure, because malicious hackers are more likely to try to pollute your survey with nonsense. With a proprietary survey vendor, there’s no way to let the community inspect the code for very common security vulnerabilities like SQL injection attacks. Given that they’d be running on an external server, it would also be harder to generate reliable (anonymized) user identifiers that can’t be easily hacked using a Perl script, to protect your survey against systematic data pollution. It’s not inconceivable that such an attack would even come from within the Wikipedia community itself, as a reaction to the use of proprietary software (believing in open source doesn’t mean that you’re not a dick).

Open source software is open for security auditing. Software which is committed to our own Subversion repository can also be fairly openly modified by a large number of committers, thanks to a liberal policy of granting access to the repository. In effect, the code is almost like its own little wiki world, with reverts and edit wars, but also a constant collaborative drive towards more quality. People from all parts of the MediaWiki ecosystem contribute to it (I’ve often said that MediaWiki is almost like a Linux kernel of the free culture movement), and are likely to share improvements if they need them, if only out of the self-interest to see them maintained in the official codebase.

If you need to retool your survey for, say, doing a usability inquiry into video use, an existing open source toolset makes it fairly easy to build upon what you have. And if you want to do a survey/poll that isn’t anonymized, hooking into MediaWiki will again make your life easier.

You might say: “Gee, Erik, you’re making this sound a lot more complicated than it is. A survey is just a bunch of questions and answers – what do you need complex software for? Can’t you just drop in a different piece of proprietary software whenever needed?” If you believe that, I recommend having a conversation with Erik Zachte, the creator of WikiStats. Erik knows a thing or two about analyzing data. He explained to me that one of the things you want to ensure is that the results you collect follow a standardized format. For example, if a user is asked to select a country they are from, you’ll want a list of countries to choose from, rather than asking them to type a string.

Moreover, you want this data to be translated into as many languages as possible. This is already being done in MediaWiki for the user interface, through the innovative “MediaWiki:” namespace, where users can edit user interface messages through the wiki itself. This is how we’ve managed to build a truly multilingual site even in minority languages: by making the users part of the translation process.

So, if you work with your proprietary survey vendor, you have to convince them to manage a truckload of translations for you, and you have to make damn sure that all the translated data is well-structured and re-usable should you ever decide to switch the survey tool. Otherwise you’ll be spending weeks just porting the data from one toolset to another. You can try to have them work on the data with you, but you’ll be spending a lot of your time trying to push your proprietary vendor to behave in a semi-open manner, when you could have simply decided to follow best practices to begin with. Companies that aren’t committed to open standards to begin with will always be driven towards a greater need to “control and protect our IP” from their internal forces: investors, boards, lawyers, managers.

Sure, you might have a higher upfront investment if there’s no existing toolset you can build on. But I find it quite funny that the same companies who go on and on about protecting their “intellectual property” are often so very quick to give up theirs: Open source software effectively belongs to you (and everyone else), with everything that entails. And it’s an ecosystem that gets richer every day. Instead of literally or metaphorically “buying into” someone else’s ideas, open source maximizes progress through cooperation. I cannot think of a better fit for our wiki world.

The reason to default to open source best practices is not ideological. It’s deeply pragmatic, but with a view on the long term perspectives of your organization. So while I agree with Florence that we should keep open (no pun intended) the option of using proprietary software in some areas of Wikimedia (particularly internal use), I would posit that any cost-benefit analysis has to take the very large number of long term benefits of the open source approach into account.

[UPDATE] LimeSurvey looks like a decent open source survey tool that we could use if we don’t care that much about deep integration.

Intelligent Designs - Erik Moeller's Blog

Building the future - one idea at a time

0 Comments

1 Pingback

Leave a Reply Cancel reply