Archiving the Future

January 29, 2004 | View Comments (15) | Category: Our Thoughts

Summary: How do we archive the information on the web so that it stands the test of time?

What happens to all the information and sites that we link to in our own sites? I was thinking abou the plethora of information on the web today and the beauty of links and how wonderful everything seems. But then I wondered what the web is going to be like 20 years from now and will all the resources we use now be around still? Linkrot now is one thing, but what about 5 years down the road? Websites come and websites go, but how many will stand the test of time?

If I continue with this site for a couple of years and end up with thousands of sites linking back to me, is it my obligation to keep the site up for as long as possible? When you really start to think about it, pressure starts to mount. Maybe there is some responsibilty for you to keep the site up, but should you? I can read the words of a book printed in 1604, but will people be able to read the words of this entry in 2404? A lot of questions I have, yet no answers.

Is there any solution?

Trackback URL: http://9rules.com/cgi-bin/mt/mt-tb.cgi/129

Comments

#1

hmm. maybe we should all donate our expired domain names to web.archive.org

I wonder if that'd work... have expired domains point to their caches on there.

JC (http://thelionsweb.com/weblog)

#2

Last week I found that a site I'd linked to had changed _every single_ one of it's page names, and as a consequence none of my links to it worked any more.

I wrote to them and explained why that was a bad thing. I explained it was in their interests to have unchanging URLS. I pointed them towards all the URL re-writing articles we know and love.

Along with a bunch of crap, I got this sentence back:

"It's your problem to keep your links fresh and accurate, not mine."

Nice attitude, eh?

If that sort of thing happens a lot (and I'm guessing it does) then in the future most of our links will be broken :o(

Dunstan (http://www.1976design.com/blog/)

#3

"It's your problem to keep your links fresh and accurate, not mine."

Hmmm, that's a great way to have people to stop linking to you. I like the idea of donating the domain name to web.archive.org later.

Scrivs (http://www.9rules.com/whitespace/)

#4

It is probably easier to read a book printed in 1604, because it was published. If the publisher is still around, you might be able to get a copy, or if someone's ancestors passed a copy down.

With links, it is not as easy, because just about anyone can have a web presence. However, not all gets published by large channels akin to a publisher. And with linking, visitors do not feel the need to save pages for later (akin to the ancestor path).

Using blockquotes can help alleviate somewhat a problem, but this seems quite situational (the original reference is still lost though).

Zelnox

#5

Technically link rot will only exist for a few more years... if all goes well pages (or information) will be linked like "objects" ...moving objects from one location to the next should not effect the inbound links. SO, if you pulled the site down, technically some place like Archive.org could (with your premission) have a copy of your site and labled as the same object. Thinking about this today gives me a headache... probably not unlike the headache everyone got thinking about pure CSS layouts when CSS1 came out in 1996.

Nick (http://www.digital-web.com)

#6

Objects? How about mobile objects! Yes, UWA HA HA AWA HA HA. Not only can they move, but they will be smart as well. They will find a home by themselves and spread like dandelions. This is evil.

But seriously, maybe someone will implement intelligent web sites that are autonomous and mobile. The next big thing to make webmasters obsolete. Hehe.

Zelnox

#7

For the issue of permanence of an URI/URL there are solutions:
http://purl.org
http://handle.net
but for the resources, I only found this paper from 1998 -
Towards an Archival Intermemory
http://www.pnylab.com/pny/papers/intermemory/main.html
Nice dream.

Laur (http://purl.org/NET/LAUR)

#8

Another issue, would you be able to read what you preserved?

Quote from http://dspace.org/faqs/index.html

"DSpace identifies two levels of digital preservation: bit preservation, and functional preservation.

Bit preservation ensures that a file remains exactly the same over time - not a single bit is changed - while the physical media evolve around it.

Functional preservation goes further: the file does change over time so that the material continues to be immediately usable in the same way it was originally while the digital formats (and physical media) evolve over time."

Laur (http://purl.org/NET/LAUR)

#9

I was about to post something, but Nick got to it first.

Or, perhaps, people could link directly to the Internet Archive instead of the page itself. However, this poses its own share of problems. First, it takes a while before the Archive finds a new page. Second, it doesn't always index images and other embedded objects. Finally, it would be an extra step for somebody who sees the site and decides they want to read it. Then again, if the first problem were solved, they could read the Archive's version of the site instead.

Suddenly, the idea of a carbon-copy of my site somewhere else that people start reading regularly is mind-boggling. :P

Chris Vincent (http://dris.dyndns.org:8080/)

#10

There are lots of issues involved, really.
Language, for example. There are languages we cannot read. Nubian, for example. There hasn't been a 'rosetta stone' phenomena for that language, so only a few symbols are understood. Probably won't happen with english, but who knows.
The media is the most obvious issue, though. Stone tablets stick around a long time. Paper can be read if preserved. But look at the various non-print media of the last few hundred years... you have wax tubes and later, metal tubes, that record sound... can't pick up a player for those very easily. Even with most of the audio and video formats prior to VHS and cassette tape you'd be hard pressed to find something capable of playing them (the turntable being an obvious exception, that'll probably be around forever, unless our descendants include audiophiles in the "B" Ark).

I wouldn't be surprised if the power grid was different to the point that you wouldn't even be able to plug a modern appliance in. So unless the message evolves through different media, it won't be readable.

Of course, relevance is also an issue. I'm sure there were plenty of books and letters and the like from 400 years ago that are no longer around... the content has to be important enough (at least, to someone) to be carefully preserved or continually copied. That's easier to do, now, of course... but doesn't make the words somehow more worthy of saving.

400 years from now, no one will care about your posts, except maybe a few historians with very niche interests.

Unless of course the whole world descends into the depths of slashdot or usenet and our descendants uphold whitespace as a holy book and battle the evils of those-who-(use-tables)-shall-not-be-named.

JC (http://thelionsweb.com/weblog)

#11

"Unless of course the whole world descends into the depths of slashdot or usenet and our descendants uphold whitespace as a holy book and battle the evils of those-who-(use-tables)-shall-not-be-named."

I have found my goal in life...

But when you put it like that it kind of makes me think about the stuff I am writing and makes you wish you could provide something that the world would want to keep for another 400 years. Of course how would you know if your content stood the test of time anyways?

Scrivs (http://www.9rules.com/whitespace/)

#12

If you want your content to stand the test of time, then write about timeless things. Most of my favourite books are over 2000 years old, they've stood the test of time because they're still relevant. So it's not so hard to know what will stand the test of time.

Joel (http://biroco.com/journal.htm)

#13

There's also the issue of 'will my site be readable in 2404?'

Who knows if browsers and the internet will still exist in the way we know them? Who knows if the very physical media that your content is held on is still supported? What if we have one of those apocalyptic scenarios where they destroy all the computers? :D

Jack (http://boxofjack.com/)

#14

Two thoughts:

1) The internet will survive because it is constantly evolving. Someone will figure out how to save it.

See: Clay Shirky on this

2) I'm going to print out everything I do and tell my friends about it. They don't suffer from linkrot, and paper lasts a long long time.

See my blatherings on this

Bob (http://www.ryskamp.org/)

#15

hey Bob, that's a really brilliant essay you wrote there.

(paper may last a long time, if acid-free, but laser-prints may not)

Joel (http://www.biroco.com/journal.htm)

Keep track of comments to all entries with the Comments Feed