12.23.05

The Wayback Wayback Machine

Posted in Ideas at 7:26 am by Erik

The Internet Archive is a great invention, providing a view through the history of a webpage. It is also a great tool for investigative journalism and academic research. Whether it’s about the history of a dubious company, or a page whose content has mysteriously changed, the Internet Archive adds wiki-like versioning to webpages that otherwise would not have it. To avoid massive copyright problems, the Archive has made two crucial compromises: It does not show pages less than 6 months old, and it retroactively deletes material when the site owners want it to (see FAQ). In fact, owners don’t even need to ask — they just have to put a special robots.txt file on their webservers, and the next time the crawlers see the site, it is removed from the Archive.

An archive where material can disappear from one day to the next without notice is a quite bizarre thing. Links to archived web pages become invalid. When a site is removed from the Archive, it appears as if it had never been there. It can provide entertainment value to see the history of popular webpages, but if anything controversial is immediately removed at the owner’s whim without as much as a verification process, then the tool uses much of its value for serious research. It’s the controversial stuff that needs to be archived the most.

If the Internet Archive doesn’t fix this flaw, it needs to be replaced with a solution that doesn’t have it, such as a decentralized storage network. As a temporary hack, it would be useful if someone set up a “Wayback Wayback Machine”, a site which, on request, crawls all revisions of a website in the Internet Archive, stores them, and makes them available to researchers who can provide credentials. This would only help to protect the record of websites where it seems likely that they might be deleted in the future (scams, phishing sites, etc.) and someone thinks of requesting a secure copy (in which case they could also manually download and save the revisions). A better long-term solution is needed, and it would be best if it was provided by the Internet Archive itself.

Until then, whenever you see something unusual in the Wayback Machine, remember to make a copy. It might not be there tomorrow.

Leave a Comment