I’m new to IPFS and am just trying to get a website up and running. While doing so I noticed there doesn’t seem to be anything stopping me using visitors nodes to cache more then just the page they are viewing. My understanding is that your local node caches any blocks you access; if that’s the case then it could be abused to force the caching of other material. If I’m browsing sites on IPFS there’s a lot going on in the background, fetching images, JS, CSS, XHR etc. To me there seems to be an incentive for websites to download as much as they can in the background while the user is on a page to maximise the availability of their content.
Potentially an even bigger issue would be caching of illegal material without the users knowledge when they visit what seem to be perfectly legitimate websites.
Am I misunderstanding something or is this a known issue and are there any ways to mitigate such abuse?
             
            
              
              
              
            
            
           
          
            
            
              
One mitigation would be to garbage collect often. That’ll wipe out anything you haven’t intentionally pinned.
             
            
              
              
              
            
            
           
          
            
            
              
FYI, websites can already do this (although you usually don’t end up serving this data).
But yes, this is a very real (and complex) problem without any clear solutions. Things we can do:
- Provide blocklists for known “bad bits” (using double hashing to avoid indexing the bad bits).
- Allow users to determine which websites/apps they want to make available. To make this work, we’d have to associate every block with each “origin” through which we downloaded the block. (HARD)
- Allow users to explicitly decide what data they want to make available. That is, we could allow websites to add a “pin this!” button (kind of like a “like this!”) that would (at the users request) tell the IPFS daemon to make the requested resource available to the network.
 
            
              
              
              
            
            
           
          
            
            
              
Website can download stuff without user knowledge already but like you say they don’t serve that content so websites authors have no reason to do put unnecessary load on their own servers.
I was giving it some thought and did come up with one potential solution that may at least removes the incentive to cache as much content as possible on the users node. It should be possible for the HTTP server to deny access to anything outside of the root hash by using the referrer in the request. Then if you get the HTTP server to wait for the entirety (or at least a good chunk) of the root hash to download before providing the entry point you encourage websites to minimise the content available to them on a single page.
Such a solution does limit some use cases(ie streaming video) for IPFS so it might be a case of having user permission to add exceptions.