S3 Datastore too many requests that causes increasing AWS costs

khoanguyen · March 2, 2022, 9:08am

Hello,

I currently use GitHub - ipfs/go-ds-s3: An s3 datastore implementation as the datastore for my IPFS. But there are too many requests (HeadObject requests) to the s3 bucket that results in significantly increasing in costs.

As I look at the code of go-ds-s3, it seems to come from the function GetSize (go-ds-s3/s3.go at master · ipfs/go-ds-s3 · GitHub).

Any advice on this? Can we configure the frequency of this routine?

And in the worst case, we may need to no longer use s3 as datastore. So i am looking for an tutorial on how to migrate the data from s3 to local machine.

Many thanks!

filebase · March 2, 2022, 12:56pm

Take a look at Filebase - S3-Compatible, Edge-Caching and at a fraction of the cost of the other pinning services out there.

5GB always free - one month 5TB trial with code “IPFS”

khoanguyen · March 3, 2022, 1:00am

Sorry for mentioning you @hector, but do you have any idea on this? About how to decrease the number of requests to S3 or migrating data from s3 to local machines (flatfs datastore).

Here is my current datastore_spec

{"mounts":[{"bucket":"bucket-name","mountpoint":"/blocks","region":"us-east-1","rootDirectory":"bucketdirectory"},{"mountpoint":"/","path":"datastore","type":"levelds"}],"type":"mount"}

I want to change it into

{"mounts":[{"mountpoint":"/blocks","path":"blocks","shardFunc":"/repo/flatfs/shard/v1/next-to-last/2","type":"flatfs"},{"mountpoint":"/","path":"datastore","type":"levelds"}],"type":"mount"}

Notes: we also use ipfs cluster along with ipfs so please advice if anything else needs to be done

hector · March 3, 2022, 1:03pm

I think datastore.Has() is implemented via GetSize().

If I’m not mistaken however, usually the response to such requests should be cached. Increasing the sizes of the cache might be one way to reduce them:

github.com

ipfs/go-ipfs/blob/6c6830c8228bc76c0c56aafa49624e286c6c01f2/core/node/groups.go#L178-L183

      
        
            func Storage(bcfg *BuildCfg, cfg *config.Config) fx.Option {
            	cacheOpts := blockstore.DefaultCacheOpts()
            	cacheOpts.HasBloomFilterSize = cfg.Datastore.BloomFilterSize
            	if !bcfg.Permanent {
            		cacheOpts.HasBloomFilterSize = 0
            	}

(unfortunately only bloom filter size is configurable).

Otherwise, if the requests are very random for very random keys there is not much to do other than not using S3. If nodes are meant to provide content publicly, they need to check if they have it when requested.

woss · June 13, 2023, 8:05am

I’ve tested the ipfs and s3 as datastore loosely following A (loosely written) Guide to Hosting an IPFS Node on AWS - Developers - Fission Talk and I had the 2 GB of data added to the node which produced a lot of requests. The node had a single bootstrap node connected to the internet.

The point of the screenshot is not the price, it is about the number of requests that happened in just a few hours of usage. This is a solid base to calculate the potential cost of the real-world example. Maybe @filebase can send real-world screenshot of their usage.

Topic		Replies	Views
Go-ds-s3 configuration Help go-ipfs	2	614	October 30, 2021
IPFS Connecting to Local MinIO Instance Help go-ipfs	0	397	October 27, 2021
Cost-efficient way to setup an IPFS pinning server Help go-ipfs , files	1	344	August 9, 2022
Request for Backblaze B2 IPFS Datastore Ecosystem and Usage	2	1001	November 14, 2024
Error: failed to enqueue CID: shutting down when using go-ds-s3 plugin Help go-ipfs , kubo	0	244	August 27, 2023

S3 Datastore too many requests that causes increasing AWS costs

Related topics