From @githubber314159 on Sun Apr 16 2017 18:45:27 GMT+0000 (UTC)
Hey everyone! I have a idea regarding the data blocks. Blocks are definitively a good idea, but I have sorrows about the way the data is effectively stored.
It would be cool if .ipfs/ contained an index which links the blocks to files (file path + offset), since files are still advantageous compared to blocks. For example, you can open them in a system where no ipfs is running.
When the original files are still available locally, they remain usable for web publishing on the own server, tracking changes with git, publishing with NFS, enabling mirroring with rsync, deduplication in a modern file system, etc. After all, I can open them with normal applications using the quite strong hierachical index of filesystems. Dealing with hashes is complicated in relation to dealing with files, folders and sub-folders. I don’t know Sia/dat very well atm, but it seems that I can use the same set of files for another network. Thinking in the long term, keeping files is way more sustainable. And consuming less disk space.
Of course, the mentioned server use cases are possible with blocks too, but I would be happy if I’m not dependent on a certain program (IPFS in this case) and a certain storage format. I would like to manage the way I store the data on my own. If the files are kept in original state, the IPFS storage format could change dramatically and one could reinject the very same data again, resulting in an entirely newly organized index.
When the files are stored as blocks in .ipfs/, the are some sort of protected from being accidentally deleted, altered or renamed by the user. This may not be the case when the files are referenced, but still somewhere in the file system. I offer five possible solutions do deal with the problem:
- Rename the files in a way no other application does. For example, have you seen Filename_ID_.extension anywhere? ".ext" is ugly and not used. If my file_hash.ext is marked as such, I know it is referenced. The hash in the filename is only for fast scanning with a script, see later.
- Change the ownership to group ipfs. I add myself to this group, too, but see that the file is referenced by the index.
- Enable this approach as opt-in option. The users who enable this know what they are doing. We can expect that they store the files in a directory purposed for publication and thereby know that they have to pay attention.
- Track/check files like IA.BAK does (git-annex).
- (Read-)Lock the files being referenced by the index (flock -s file.ext).
Now what if human error still occurs? Well, one could argue that a file missing file means that it can not be hosted by the peer and the index should be updated accordingly. Simple thing if the user then re-adds the renamed file: the index just points to the new path. One could imagine a periodic “scan.sh false” which produces list.txt. list.txt then informs the index of renamed files.
Otherwise, we may keep track of the files by their contents, e.g. by using the output from the script attached.
The advantage of this approach may vanish if the FUSE module is integrated better, e.g. showing the files that are in the local repo without having to call ipfs pin ls --type=recursive.
So what do you think? Let me know!
Edit: I’ve seen ipfs filestore, but I don’t know where I can look up more / who I have to contact.
Attachement:
#!/bin/bash #License: GPLv3 #usage: scan.sh /absolute/path/to/rootDir [full=true] if [ -z "$1" ]; then pwd="$(pwd)" else pwd=$1 fi if [ -z "$2" ]; then full="1" else full="0" fi find "$pwd" -type f -regex ".*[^txt]$" -print > scan_list.txt sort scan_list.txt > scan_list_sorted.txt cat scan_list_sorted.txt|xargs -d '\n' ls -alU --time-style +%s > scan_ls.txt if [ $full -eq "1" ]; then echo "full scan" cat scan_list_sorted.txt|xargs -d '\n' sha512sum > scan_sha512sum.txt cat scan_ls.txt scan_sha512sum.txt > scan.txt rm scan_list.txt scan_list_sorted.txt scan_ls.txt scan_sha512sum.txt else echo "file listing" mv scan_ls.txt scan.txt rm scan_list.txt scan_list_sorted.txt fi
Copied from original issue: https://github.com/ipfs/faq/issues/253