Help Understand core concepts of IPFS Node and IPFS Cluster

Javiseeker · May 31, 2021, 10:12pm

Good day,
To anybody that could help me understand these core concepts better, thank you very much. I really appreciate it.

Say I have 2 IPFS Clusters, each with 2 IPFS Nodes located geographically to cover more users. I open all required ports on the nodes so that it is possible to get information from each of them.

Set of Questions 1, about IPFS Clusters: Since both IPFS Clusters are connected via the SECRETHASH, are they going to replicate the same information across all 4 nodes whenever a new file is added to Cluster 1/Cluster 2? Can I isolate the information for Cluster 1 and Cluster 2 but still access it from any node? If I create a new IPFS Cluster, how do I connect it to my existing Clusters Network? Is there a way to obtain data directly from the Cluster instead of a Gateway Node? Like the Add endpoint that automatically pins the new file in all nodes. For example, the cluster automatically knows which node is closer to the request position and uses it to get the information (for latency purposes).

Set of Questions 2, about IPFS Nodes: If I want to add more Nodes to IPFS Cluster 1 automatically (via kubernetes+terraform), how do I connect this node to the cluster? how do I know all the existing data in Cluster 1 is replicated to this new node? and is there a way to know that this new IPFS node is ready to use? (like a notification).

Thanks in advance for any help! I’ve been digging IPFS and IPFS Clusters 2 weeks now, and it’s really nice! My company is hoping to use it in a big-scale production project.

hector · June 1, 2021, 8:06am

Hello,

IPFS Cluster peers are just programs that form a private libp2p-based network and keep a common list of CIDs and metadata (global pinset).

Each peer can interact with an IPFS daemon and tell them to pin/unpin. Data storage and transfer falls within the IPFS daemon’s tasks.

Among the metadata stored by cluster there are things like the replication factor and the list of cluster peers that should tell their IPFS daemons to “pin” the CID (pin “allocations”). You can choose to pin everywhere, or pin in specific places only. The cluster by itself does not select based on latency/geographical position, only based on available free space.

Finally, cluster peers can be added or removed freely. There is no notification as to when an IPFS daemon has finished syncing something from the cluster, but you can track progress with something like ipfs-cluster-ctl status --local --filter=queued,pinning.

I suggest you check:

The guide about pinning, which explains the pinning process: https://cluster.ipfs.io/documentation/guides/pinning
The peerset guide, which explains how adding and removing pins work: Peerset management - Pinset orchestration for IPFS
ipfs-cluster-ctl <subcommand> --help usually has lots of information about what different options exist ipfs-cluster-ctl - Pinset orchestration for IPFS

Javiseeker · June 1, 2021, 4:05pm

Thank you Hector! I will take a look at the suggested links. Also, I started doing several test cases to understand the replication of data, and I got an issue. I created 2 IPFS Clusters with a different SECRET and each with 2 gateway nodes. I added a file to Cluster 1 and verified through the pins endpoint that it got pinned, then I used Cluster 2 pins endpoint and it didn’t get pinned and it works as expected since they are not related by SECRET. The issue is that I can still get the same submitted file from all nodes, I thought the new file should only store in the IPFS Nodes related to Cluster 1. Related nodes to Cluster 2 have the new file, but the Cluster 2 has nothing pinned. Can you help me understand this?

hector · June 1, 2021, 4:51pm

The IPFS nodes are on their own public network, unrelated to cluster’s one and therefore can share files. If you want to isolate networks you need to use this: https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#private-networks

Javiseeker · June 1, 2021, 9:32pm

Thank you, I see I need to add a swarm key to the IPFS node to make it part of a private network. Is there a way to do this automatically with docker-compose? on set up?

hector · June 2, 2021, 12:03pm

Yes, see go-ipfs/container_daemon at d219ad7de6b484a6ad622b4adbd0fed37fbea6e7 · ipfs/go-ipfs · GitHub

The container checks IPFS_SWARM_KEY and IPFS_SWARM_KEY_FILE during initialization.

Javiseeker · June 17, 2021, 1:13am

Hi there, after setting the key hash the node fails to initialize, do you know any reasons why that would be?

hector · June 17, 2021, 8:35am

I do not have a magic ball to know what the error is.

Javiseeker · June 17, 2021, 1:44pm

You are right, sorry. Is there a command to further debug the messages. I don’t see any issues, just the docker exiting.

Javiseeker · June 17, 2021, 2:37pm

This is the docker-compose.yml file I made with a basic cluster and added the keys.

networks:
  vpcbr:
    driver: bridge
    ipam:
     config:
       - subnet: 10.5.0.0/16


services:
  # ipfs peer/node0 only one in charge of adding data for cluster0
  ipfsNode0:
    container_name: tipfsNode0
    image: ipfs/go-ipfs:latest
    ports:
      - "4001:4001" # ipfs swarm - expose if needed/wanted
      - "5001:5001" # ipfs api - expose if needed/wanted
      - "8080:8080" # ipfs gateway - expose if needed/wanted
    volumes:
      - ./data/tipfsNode0:/data/ipfs
    environment: 
      IPFS_SWARM_KEY: 1aebe6d1ff52d96241e00d1abbd1be0743e3ccd0e3f8a05e3c8dd2bbbddb7b95

    networks:
      vpcbr:
        ipv4_address: 10.5.0.5
  # ipfs peer/node1
  ipfsNode1:
    container_name: tipfsNode1
    image: ipfs/go-ipfs:latest
    volumes:
      - ./data/tipfsNode1:/data/ipfs
    environment: 
      IPFS_SWARM_KEY: 1aebe6d1ff52d96241e00d1abbd1be0743e3ccd0e3f8a05e3c8dd2bbbddb7b95

    ports:
      - "4101:4001" # ipfs swarm - expose if needed/wanted
      - "5101:5001" # ipfs api - expose if needed/wanted
      - "8180:8080" # ipfs gateway - expose if needed/wanted
    networks:
      vpcbr:
        ipv4_address: 10.5.0.6
  #cluster peer 0
  cluster0:
    container_name: tcluster0
    image: ipfs/ipfs-cluster:latest
    depends_on:
      - ipfsNode0
      - ipfsNode1
    environment:
      CLUSTER_PEERNAME: tcluster0
      CLUSTER_SECRET: 1aebe6d1ff52d96241e00d1abbd1be0743e3ccd0e3f8a05e3c8dd2bbbddb7b93
      # CLUSTER_REPLICATIONFACTORMIN: -1
      # CLUSTER_REPLICATIONFACTORMAX: -1
      CLUSTER_IPFSHTTP_NODEMULTIADDRESS: /dns4/ipfsNode0/tcp/5001
      CLUSTER_CRDT_TRUSTEDPEERS: '*' # Trust all peers in Cluster
      CLUSTER_RESTAPI_HTTPLISTENMULTIADDRESS: /ip4/0.0.0.0/tcp/9094 # Expose API
      CLUSTER_MONITORPINGINTERVAL: 2s # Speed up peer discovery
    ports:
      - "9094:9094" # Open API port (allows ipfs-cluster-ctl usage on host)
      - "9095:9095" # The cluster swarm port 
      - "9096:9096" # Cluster IPFS Proxy endpoint
    volumes:
      - ./data/tcluster0:/data/ipfs-cluster
    networks:
      vpcbr:
        ipv4_address: 10.5.0.7

After running it in docker,
none of the nodes run successfully and I get this log on every single one of them:

Changing user to ipfs

ipfs version 0.8.0

generating ED25519 keypair...done

peer identity: 12D3KooWGHs8mm3jTNYogbpjjiTzS7QL1kVP1eZz4V2e4CsGfC5x

initializing IPFS node at /data/ipfs

to get started, enter:

ipfs cat /ipfs/QmQPeNsJPyVWPFDVHb77w8G42Fvo15z4bG2X8D2GhfbSXc/readme

Copying swarm key from variable...

Initializing daemon...

go-ipfs version: 0.8.0-ce693d7

Repo version: 11

System version: amd64/linux

Golang version: go1.14.4

hector · June 17, 2021, 2:58pm

set IPFS_LOGGING=debug and see if that gives more info.

Javiseeker · June 18, 2021, 12:04am

this was the error for all nodes,

I did generate a key with an .sh file:

#!/bin/bash
set -eu

function main() {
  echo -e "/key/swarm/psk/1.0.0/\n/base16/\n$(tr -dc 'a-f0-9' < /dev/urandom | head -c64)"
}

and I got this as an output:

/key/swarm/psk/1.0.0/
/base16/
b87ba5d2a00e1fb91d386411639897605e39813b054c434fe9b6fc60aab07ea5

I thought the last hash is the IPFS_SWARM_KEY since no file got generated.

hector · June 18, 2021, 8:40am

I think you need to put those 3 lines in your IPFS_SWARM_KEY variable, not just the last hash.

Unfortunately, this is how go-ipfs wants things.

Javiseeker · June 18, 2021, 12:59pm

yes! that did the trick! I used:

IPFS_SWARM_KEY: /key/swarm/psk/1.0.0/base16/b87ba5d2a00e1fb91d386411639897605e39813b054c434fe9b6fc60aab07ea5

as a one-liner. Thanks!

Topic		Replies	Views
Data Replication Among Multiple IPFS-Clusters IPFS Cluster	1	1516	January 30, 2019
Inter ipfs cluster communication ipfs-cluster	6	627	November 2, 2020
Scalable IPFS cluster based private network setup Help go-ipfs , ipfs-cluster	1	364	July 5, 2023
Private ipfs network, ipfs-cluster and pinning ipfs-cluster	4	1130	April 27, 2022
Questions on using ipfs-clusters as seed nodes Help	0	474	March 20, 2018

Help Understand core concepts of IPFS Node and IPFS Cluster

Related topics