Is there currently a way to act on parts of a text file as they are downloaded?

From @whereswaldon on Thu Jun 30 2016 14:24:07 GMT+0000 (UTC)

I’m interested in streaming large text files and performing analysis on the text chunks as they come in. How would I go about doing this via IPFS?


Copied from original issue: https://github.com/ipfs/faq/issues/141

From @Kubuxu on Tue Jul 05 2016 16:41:17 GMT+0000 (UTC)

You can also use files API.

ipfs files cp /ipfs/QM...AAA /myfile.txt
ipfs files read --offset=N --count=M /myfile.txt

Then you will read M bytes at offset N.

From @whereswaldon on Tue Jul 05 2016 16:48:04 GMT+0000 (UTC)

@Ghoughpteighbteau Thanks, especially for the tip on HLS. I didn’t realize that the video streaming wasn’t inherently baked-in to IPFS.

@Kubuxu This only works after you have downloaded the file though, yes? Is there a way to do the same thing while the download is going?

From @Ghoughpteighbteau on Tue Jul 05 2016 17:06:21 GMT+0000 (UTC)

The problem, I think, is that the way IPFS downloads is sorta like bittorrent. It chunks files up and constructs an acyclic graph of them. The data could theoretically be requested in order, you could do that by manually traversing the graph, but that’s on you, IPFS, like bittorrent, is just going to grab any chunk whenever it’s available. I think you have to write your own stuff that works with IPFS’s plumbing if you want it in sequence.

Now that I’ve written all that. I guess it is possible. :shipit:

From @Kubuxu on Tue Jul 05 2016 17:16:14 GMT+0000 (UTC)

@whereswaldon it will not download while file, only the chunks you access with ipfs files read. ipfs files cp is zero cost operation, it doesn’t perform any download.

From @whereswaldon on Tue Jul 05 2016 18:11:44 GMT+0000 (UTC)

@Ghoughpteighbteau I actually don’t care about the order of the chunks for my particular use case. I just wanted to act on them as they arrived. I’m potentially working with Gigabytes of text, and I’d like to start processing as soon as they arrive.

@Kubuxu Oh, okay. I’m not sure that this fits my particular use-case, since I doing that would require many sequential requests, but it’s an excellent example of how to stream data where order is sensitive. Since I don’t care about order very much, I think the other approach is somewhat more promising. Thanks for bringing this up though. I wonder now whether you couldn’t build a streaming service just on top of that functionality.

From @whyrusleeping on Tue Jul 05 2016 19:36:36 GMT+0000 (UTC)

We are going to have a rudimentary pub-sub mechanism soon that will allow ‘live’ streaming of data. Most of the code to do this is there, but we’re not solid on the api interface yet, so it hasnt merged.