Hello everyone,

Has anyone ever considered potentially building or launching a platform similar to the Internet Archive but using ActivityPub?

This could serve as a decentralized network to document, preserve, and protect online content from loss, censorship, and other threats, ensuring its availability for future generations.

For those unfamiliar, the Internet Archive is a non-profit that has been preserving digital media and promoting universal access to knowledge since 1996.

It’s famous for services like the Wayback Machine and Archive-It.

Given the importance of preserving digital heritage, especially in the context of censorship and data loss, a Fediverse-based equivalent could fill a crucial role.

The decentralized nature of ActivityPub could provide a robust alternative to centralized solutions.

I’d love to see this kind of project come to life, but, unfortunately, I lack the motivation, time, and energy to take it on alone.

Has anyone else ever considered something similar?

Are there any existing projects that might be interested in this direction?


Internet Archive Wikipedia

ActivityPub seems like the wrong tool for this job

You’re more looking for a decentralised distributed file system/object store as the base for this.

And it’s going to require a lot of participants in the network to get to the storage capacity and redundancy necessary for it to function well

asudox
link
fedilink
204h

IPFS? I assure you, no individual here can afford to host even one single copy of the whole Internet Archive

haverholm
link
fedilink
94h

So, in my understanding ActivityPub is fine for different forms of decentralised communication — what you’re suggesting sounds more to me like a generalised peer-to-peer network or distributed file storage (see DAT or IPFS)?

The issue I see is ensuring that a distributed archive is comprehensive. How do you know what’s missing and needs to be added unless there’s a central coordinating process aware of what everyone already has?

@floofloof@lemmy.ca
link
fedilink
4
edit-2
3h

There are distributed filesystems with redundancy, but the last time I tried something like that, it was extremely slow for both reading and writing. For an offline archive it might be feasible, but you’d have to do a lot of redundancy and error correction to be sure you didn’t lose chunks. Plus, the Internet Archive is so big that even with the data distributed, each participant might have to store a prohibitively large amount.

Internet Archive itself has apparently been involved with something called Filecoin, which I assume would solve that kind of issue with ‘blockchain,’ somehow.

https://blog.archive.org/2023/10/20/celebrating-1-petabyte-on-the-filecoin-network/

Is that like the usual blockchains where every computer has to store a complete copy? That would get huge with the Internet Archive.

asudox
link
fedilink
3
edit-2
3h

No, just some metadata:

Filecoin is an open protocol and uses a blockchain to record participation in the network.

Mike Wooskey
link
fedilink
23h

This is well beyond my skillset (or knowledge level), but something like ArchiveBox combined with ActivityPub might be able to distribute internet archiving, each instance sharing with the fediverse what it has archived.

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of “federation” and “universe”.

Getting started on Fediverse;

  • 0 users online
  • 92 users / day
  • 135 users / week
  • 187 users / month
  • 420 users / 6 months
  • 1 subscriber
  • 1.06K Posts
  • 13.9K Comments
  • Modlog