We’re running into an issue here at Pinata where our gateways are being hit pretty hard for content that we’re not actually storing on our host nodes.
Over the past couple of days this has gotten super bad to the point where over 1TB of unique data is being cached per day (Our gateway’s max cache is 1TB max otherwise I’m sure it would have been more) and roughly 10TB of bandwidth is being consumed per day.
The simplest solution I have to solve this is to prevent our gateway from serving content that our users aren’t actually storing on our platform. In order to do this we would need some type of custom “filtering functionality”. I’d love to hear others’ opinions on the best / most efficient way to handle something like this.
The three best ideas I have are:
Have something native to IPFS where gateways can simply ask an api endpoint (could be running locally). “Can I serve this content?” before serving the content. This would be pretty flexible so gateway providers could implement any logic that custom fits their needs in regards to content blocking without IPFS having to get too opinionated.
Implement some kind of special NGINX filtering that can do the same as mentioned in option 1 before the request even gets to IPFS. I was talking with @adin and he recommended tagging @olizilla and @lidel as two that might have thoughts on how this could be solved as well.
Implement a NodeJS proxy that sits behind NGINX and performs the operations I was mentioning above. If the request passes, then the request gets forwarded onto the IPFS gateway and streamed back to the user.
@mburns feel free to chime in with any magic dev ops knowledge you may have as well.
I really appreciate any thoughts as this is starting to get fairly expensive for us to handle.
Well, if you want to only gateway your own hosted content, then you wouldn’t fit the definition of an “open” gateway and might want to remove pinata from the “public” gateway checker’s list. I know of one application that uses this list to identify gateways from which IPFS content can be retrieved. https://ipfs.github.io/public-gateway-checker/
I’ve not tried to lock down a gateway node, but I would explore
limiting the ip addresses that your gateways will dial via Swarm.AddrFilters - tho I think you’d have to filter out “the Internet” and still allow your internal network, so that may need a PR to allow for filters like “block everyone except this range”.
have the gateway nodes use delegated content routing - tell them to ask the pinning nodes to search for content on their behalf, and then enable NoFetch on the pinning nodes… however I suspect NoFetch only applies to direct gateway requests, and you would still want pinning nodes to be able to fetch blocks from other pinning nodes, so this may be an enticing blind alley.
@olizilla the first option seems promising. Do you know how that would behave if our gateway asked our host nodes to find content and content was found on a different network node? I’m assuming we would need to disable some relay functionality on IPFS in order to fully make things seamless.