I used IPFS-Cluster to host LineageOS for a few months. I’ve learned many things about IPFS and IPFS-Cluster and I’ve seen many patches resolving problems I was currently having. However, it’s time for me to say goodbye to lineageos-on-ipfs.com. This is the full story of my project, including what I wanted from IPFS-Cluster at the beginning.
LineageOS has a limited storage space to store their builds. Every 4 weeks, they delete old builds. The problem with that is that sometimes, devices aren’t receiving new builds because of a lack of maintainer, so they become unsupported and eventually vanish. However, using LineageOS, even if its support was dropped, is still advantageous and more secure than using the latest stock firmware for most devices that are unsupported by their manufacturer.
With that in mind, I’ve set up to store the latest build of every device on IPFS, so that old but useful builds will stop vanishing.
Using IPFS to store LineageOS’ builds seemed like a good idea, and it still seems like a good idea from my perspective. However, the amount of effort it requires warrants a programmer working full-time on this alone. Hence, I quickly ran into problems.
Here’s the material that I connected :
- Droplet on Digital Ocean, named
- Laptop running Ubuntu Server, named
- Desktop running Ubuntu, named
- Laptop running Ubuntu, named
Droplet was 100% online, but had very limited storage space. 50 GiB, to be precise. Thus, whenever the cluster died, the node would full itself and choke to death. It was running the web interface lineageos-on-ipfs.com, which was responsible for displaying the builds to the public, and LOSGoI, which was responsible for fetching the builds from LineageOS’ website and putting them on IPFS.
HP was running most of the times. However, my router would reboot every night at 4 AM. This brought down my home network and decimated the cluster every single time.
Helion was mostly running, but I was dual-booting Windows and I was actively using that device.
Asus was pretty much all the times offline, and sometimes came up online.
raft was the only consensus available on IPFS-Cluster. When using
raft, 50%+1 of nodes had to be online at all time for the cluster to be alive. However, achieving 100% uptime is nearly impossible using consumer material. After many days of trial, I realized that the project was doomed to fail, but I was still determined to make it work.
At first, I had
leave_on_shutdown turned off. However, this caused the cluster to be irrecoverable pretty much every night. Nodes would be down, and bootstrapping them would fail because Droplet still thought the nodes were part of the cluster, but they had been removed when they came up online with the
--bootstrap option. Moreover, because less than 50%+1 nodes were online, no other node could join the cluster.
Then I tried to enable
leave_on_shutdown. This option was pretty much useless since they would leave when the network came down, without warning Droplet, which resulted in the same errors as before.
raft consensus was too strict to be used at all. Nodes wouldn’t be able to connect after the first time they left the cluster, so I had to clean the state pretty much every day, which is quite a bad idea when you want to store critical data.
crdt consensus came to be. It was a better consensus because my nodes could connect to the cluster even after leaving, which was a huge improvement. However, when the router rebooted at 4AM, every single member of the cluster would become their own cluster, without reconnecting to any other node after the network connectivity was established again. To join the cluster again, I would need to create a daemon that checks if it’s part of the cluster, then reboot IPFS-Cluster to join the cluster.
Essentially, I wanted IPFS-Cluster to create a cluster of small IPFS nodes working together to solve larger problems. In reality, IPFS-Cluster will only shine with larger nodes in a production environment and 100% uptime.
What I want of IPFS-Cluster is the ability to create and join multiple logical clusters. Each logical cluster would be made of write nodes and read nodes. Write nodes would be nodes that have the ability to write to the consensus. By opposition, read nodes would be nodes that only have the ability to read a consensus.
To obtain read permission or write permissions, an IPFS-Cluster node would join a logical cluster using a read key or a write key previously generated by the logical cluster. Read keys are keys that could be given publicly so that random users could join a specific logical cluster and help it by donating storage. Write keys are keys to be kept secret and used by trusted nodes to write to the consensus.
Whenever a random user comes offline or is disconnected from the network, the cluster should redistribute its lost data. Ideally, the nodes with the larger amount of free space should take the charge, up to the pin’s
replication_factor_max. Nodes that are still alive but disconnected should attempt to reconnect to other nodes every minute. Moreover, nodes that’re almost full should also attempt to offload some data to other, larger nodes.
This would allow the setup that I had to work regardless of network conditions or router reboots.
I dream of an IPFS Cluster that could be ran on servers, desktops, laptops, mobile devices and internet of things that wouldn’t care about network conditions. Nodes could come and go, and the cluster would repair itself using the available peers and the rest of the IPFS network.