For this demo, we got a DSL connection that can be disconnected to illustrate the offline capability of the system. See the cable hanging out of the wall to the left of the left big screen. (A part of the swarm is in the cloud (EC2) and used to publish new app versions or work instructions)
At real customer installations, they typically have a SDSL connection to the main office and to the internet. These can be quite slow, since manufacturing is usually in the countryside and not in a city center where you get decent speeds. IPFS provides a big benefit here, since we can get some large assets to 100s of devices without having to download from the cloud 100 times.
That I can do. It will be quite a long list though. Don’t get me wrong, all in all we are quite happy…
High level:
We would love to have a dependable roadmap about when the current parts of IPFS (specifically IPNS and pubsub) will become production ready. I took a big risk in recommending IPFS and a fully distributed approach instead of a more traditional centralized approach. I think going with a more distributed architecture will pay off, but it would be great to know that significant resources of protocol labs are dedicated to making the current features of IPFS rock-solid and stable. There is some concern that there are so many things to be worked out about filecoin that work on IPFS will be neglected.
We got a set of backup plans, e.g. using centralized MQTT instead of IFPS pubsub if pubsub is not production ready when we need it. But we would very much like to avoid using additional systems to reduce complexity and single points of failure.
Technical:
Currently, we are using IPFS just for distributing the application and large static assets. The data distribution functionality works reasonably well, with the caveat that the DHT seems to require a lot of bandwidth. The mDNS discovery also works well with android devices.
However, we have had a lot of problems with IPNS. First of all, publishing to IPNS and resolving IPNS names is frequently very slow. We also had some for now inexplicable issues where old/wrong hashes for IPNS names appeared. We will file a bug report once we have figured out what exactly is happening. It would be good to have a way to access the low level details such as sequence number etc. to troubleshoot this.
What we really want is the following:
- notification of IPNS updates (currently we are polling ipfs name every few minutes)
- recursive pinning of the hash pointed to by an ipns name, with unpinning of the old hash on update (currently we use
ipfs pin update
when we get a new hash for a name)
- a way to get notified whenever a new ipns entry is fully pinned (so we can only switch to e.g. a new version of an app when it is fully available locally (in case the device were to go offline in the next second))
- some way to only resolve an ipns name to the latest fully pinned hash
- ability to publish to a key/name when a device is offline (=has no peers)
- an app should be able to store its state on a device by publishing to a name, even while offline
ipfs name follow · Issue #4435 · ipfs/kubo · GitHub contains some good ideas to solve some of these points.
For the communication between the devices, we would heavily rely on pubsub, and would urgently need a system that puts more effort into delivering pubsub messages. If I got the current situation correctly, if you have a topology a <-> b <-> c
, and just a and c are interested in topic X, they won’t be able to communicate. This will have to be changed, or else we will have to use MQTT. Obviously something more efficient than floodsub would be highly desirable, but with our relatively small swarms we might be able to live with floodsub initially.
What we don’t need and want are any delivery guarantees (order, at least once, at most once, exactly once) except for best effort. We would much prefer a quick, best effort UDP style system than some sort of TCP approach that would have significant overhead. Our system will be layered on top of pubsub and will ensure “eventual exactly once delivery” by publishing hashes of an event log, somewhat similar to what orbit db does. We also don’t have a need for auth or encryption. If we need encryption we would layer it on top of pubsub.
One more thing: it would be great if there was a guarantee that some sort of private swarm feature would eventually become non-experimental.
Not really. Support for Android P2P wifi would be very cool. But having the things that are currently there production quality and stable would be way more important in the medium term.
For a typical customer we are looking at 20-500 devices (devices are industrial android tablets for GUI terminals and industrial PCs for machine interfaces and interfaces to other on-premise systems (e.g. ERP)). For larger customers there usually are multiple sites, so that would be across NATs. For small customers there is just the usual NAT between the private network and the cloud.