ActivityPub update from FOSDEM 2025

I mostly just attended ActivityPub-related events at this FOSDEM, and there wasn’t a lot of talk about storage or content-addressing, most of the focus was on interoperability, product/UX thinking, and testing. There was, however, a few points (mostly hallway track, rather than in the public Q&As of talks) where there was talk of multi-instance/cross-instance infrastructure that would help lower per-MAU hosting costs and increase UX for the fediverse as a network. Here are a few such systems/services where I thought perhaps IPFS/DASL/IROH or some other form of content-addressed storage might help:

  • reducing network traffic by using “SRI”-like content-addressing of media attached to public posts, e.g. so that relays or CDN-like intermediaries (rather than originating hosts) could be queried (once) for images being replied to, annotated, and retooted. There’s not much specified about as:Public-only [Mastodon-API-defined] “Relays”, but hopefully in the coming months this will become a more public part of the conversation, particularly as Mastodon gGmbH productizes/operationalizes some of the previously adhoc shared services powering their platform.
  • CDN-like distribution networks would also be needed for Fediverse software to achieve comparable UX/latency to media-intensive commercial social software (e.g. peertube and pixelfed catching up to youtube and instagram in load times and UX, to say nothing of hosting costs)
  • queryable, nonpublic moderation records in a unified format (of the sort required for scaleable DSA compliance) would also benefit from being syncable and easily aggregate-able, particularly as they could span multiple moderation authorities (i.e. instances). See User Story #2 in this old Fediverse Enhancement Proposal I wrote as part of a grant from the Sovereign Tech Agency.
  • Language models useful for “feed generation” or algorithmically weighting individualized feeds could be hosted as instance-local services. The “local LLM” model kind of presumes some kind of “package manager” for updating models periodically (the economics and ergonomics of such “subscriptions” to LLM updates/refinements are a hazy prediction I hear from both commercial and open-source ML researchers).
1 Like

Thanks for sharing — sounds like a solid hallway track with lots to pursue. Deduplicating queries to relays via CIDs sounds like an extremely IPFS-shaped problem! As we discussed in a call, let’s try to identify a relay operator/committer to collaborate with.

Also, what does SRI refer to here?

Subresource Integrity – basically, modern web is bLazingly fast because all the javascript and images are linked from the HTML with a hash attached, meaning the requesting client only need to fetch the html from the actual server, everything else is cached by CDNs, cell towers, and other intermediate servers, so client gets it from “anywhere” and verifies against hash, falling back to actual server if needed. ActivityPub objects could also attach a hash to the URLs of their attachments and thus reduce the “DDoS effect” created when a toot goes viral. Anecdotally, media attachments eat up a lot more cloud network traffic charges than the toots themselves…

(note that in SRI, direct hashes are used; DASL might make more sense here than unixfs CIDs or iroh’s blake3 CIDs)

1 Like