git ssb


keks / untitled

question: two-tier approach

Opencryptix opened this issue on 2/22/2017, 12:48:00 PM

question: two-tier approach

But we can't just say I'll save your snapshot and you save mine because snapshot will greatly vary in size

I don't get that argument. The quota-management problem doesn't go away just by spreading chunked snapshots everywhere.

For me this always was a two-tierd approach.

  1. have full size/hot mirror locations that have all the data
  2. have cold/redundancy locations that store some portion of the chunked snapshots

*: if the erasure-coding process is reproducible, the full size locations can re-create the data that is stored by the cold peers.

%bffaBRbF4nOsz+tw5W28Vb+TdZJKHPm7+Qh9oOSga+0=.sha256 keks · 2/22/2017, 5:10:02 PM

That sentence is more about that we need a more flexible approach than just counting snapshots, and using chunking fixes that.

What you are right about is that it doesn't fix the problem of overusing your peers' storage. My first idea was basically "bartering bytes" in the sense that you can store some stuff on my disk and then later I go to you and store some stuff myself.

That leaves us with the problem that some people have much more data they need to back up than others, making bartering kinda hard.

Also on one hand we want to keep the storage balance as even as possible, on the other hand we want to distribute the erasure coded chunks as evenly across our peers as possible.

Does that make sense?

%yd1gnsuEScWmIj2kRU9vHxWBXAmZjWl1hOJ9YI3uJUI=.sha256 Dominic · 2/23/2017, 12:51:48 AM

sounds a like similar ideas to

(which was actually one of the original inspirations for ssb - they were (talking about) building a distributed FS on top of peers that made IOUs on secure logs... and I was like, hey why not just let the humans use the log interface?

%853/7vj3hcDA9zoPODHNwYE7J/M1+WZzxV7Uo4p555E=.sha256 jiangplus · 2/23/2017, 3:48:30 AM

I have been looking into cryptosphere for a couple months, since they also make the only NaCl ruby binding lib.

%q2zaD5qen7Mq9GAqAwS5CDeTA8LIKrTuWrAOIz/7yoU=.sha256 cryptix · 2/24/2017, 10:39:19 AM

Does that make sense?

okay, I get that point now. Chunks are easier to distribute size-wise than single monolithic snapshots. Will make an issue about quota managment next.

Still doesn't answer my question about the hot/cold locations.

I don't think a system where you need to aggregate the different chunks first before you can replay the snapshots into zfs will be very appealing. I fear the network churn similar to the one of rebulding a raid set.

Tradeoff again, of course, but I'd rather have 2 full copies of my data than 5 chunked ones that can only rebuild 99% of my data (which might be useless than because of the crypto inside).

This might even out if the number of players is large enough but for a small set it feels risky or the redundancy factor needs to be large to account for unavailable machines.

%PxJeOe7FYbulV2lpY/2qoKoWZveReNFQ/PPGj3AFs3E=.sha256 keks · 2/25/2017, 11:23:25 AM

Still doesn't answer my question about the hot/cold locations.

Well the idea always was that there is there server a user connects to via iSCSI (I guess the hot location as you call it) always has the whole copy, while that server's peers collectively store a chunked and erasure-coded version of that data. In a sense, your server's peers collectively constitute the cold location.

It's important that users have the ability to reconstruct all their data without the help of their server's operator (who might have been hit by a bus). So maybe the required information needs to be sent (e.g. by encrypted email or private ssb message) to the users after a snapshot is sent to the cold locations.

Built with git-ssb-web