git ssb - Don't consider a connecting peer as inactive. On my quite slow machine I was getting lots of connect/disconnects of the same peers over and over again. Turns out that gossip is checking every 2 seconds and if you have a slow machine or a slow connect...

Don't consider a connecting peer as inactive. On my quite slow machine I was getting lots of connect/disconnects of the same peers over and over again. Turns out that gossip is checking every 2 seconds and if you have a slow machine or a slow connect...

%MBZC4ix+ju7Yjtrmh0AQ2NH6xlyZuHs1Z327cs43Xr0=.sha256

Open arj wants to merge commits into master from arj / scuttlebot / fix-gossip

arj · 12/31/2016, 8:20:15 PM

Don't consider a connecting peer as inactive. On my quite slow machine I was getting lots of connect/disconnects of the same peers over and over again. Turns out that gossip is checking every 2 seconds and if you have a slow machine or a slow connection, these can be seen as inactive and fall into the retry queue which will quickly meet its quota.

%BSc65ObJgsk0uF7oaxVmJeoN8HBziBpfDefur9Kqe08=.sha256 ev · 12/31/2016, 9:35:52 PM

I gave this patch a spin, but I'm seeing peers as 'disconnecting' for long periods of time now instead of connected. I need to take a closer look at the replication code to figure out why this is...

In fact, I think the entire replication schedule deserves a closer look.

%11Px0GTfrSbCRFrtt2UBVMZn3TkHM3hvBGgnWLdCC/k=.sha256 arj · 1/1/2017, 11:34:27 AM

A bit wierd, why are they labelled as legacy?

But yeah the fix should be rather innocent as long as the state is correctly handled.

%aoQIggOqC/uz9t7zId3NQFAEHH+ZpfB2k8FUZgyz/M0=.sha256 arj · 1/4/2017, 8:03:40 PM

@ev is this plain sbot or do you have any local patches? Because with this patch, patchbay and sbot has been much more stable for me. I added a ton of debug to the gossipping code to find the problem is, and there is definitely a problem in that part of the code. But it might be harder for other clients to trigger. Then again, I see some wierd "end of parent stream" and handshake problems that might be related to this if the connection is closed during connect.

%zIafRGyB1WbO1RtvVxUodJ4B/V+rUgucunrhgtzj+ug=.sha256 ev · 1/4/2017, 10:06:21 PM

@arj I just tried your pull-request on a fresh clone of sbot and I'm not having the same issue I mentioned above. It might have been that I merged it into flume or my local replication patch and that was the issue.

%/PObFLtj9gR//1F7ypCaX9+qWJtTHRjQOBbORUGLG30=.sha256 arj · 1/5/2017, 5:06:26 PM

Okay, that makes more sense :)

@dominic what do you think, can this be merged in? I'm eager to see if it improves the random connection problems people are seeing.

Thanks for testing @ev!

%uVcymNcYEA7pDpYZLIIlDDE6yynjobsGW6wcZvcXgjc=.sha256 arj · 1/27/2017, 8:36:15 PM

@dominic what do you think about this patch? I guess you are the only one qualified to review it :)

%sA5mzueB2hP+vznuqPTrIAIs4KlhWL2JPm0axiQDi3w=.sha256 Dominic · 1/29/2017, 1:13:19 AM

sorry, yes I this is a problem. The trouble with this problem is question of how do we evaluate whether this is really working well? Given how pubs are advertised currently, we have a lot of old pubs which are now disfunctional.

Ah, it seems I am not online enough to get this currently...

%l3Kp6zABeHjrf57G9GWa6g2UthXWtn9hjUZ/roehg7o=.sha256 arj · 1/30/2017, 8:58:11 PM

Do we have an idea of how many old pubs are around? I'm all for backwards compatibility, but maybe we can get most updated. The operations are hopefully checking sbot once in a while.

I'm seeing quite a lot of errors from the network as it is now, might just be much low-spec machine but I have a feeling that with some patches and getting people to upgrade we have a better idea of what the problems are.

%7MpKR7T27v0HzaqTg4ylJuAtOjz1F2KdqWJKUybZn0w=.sha256 mixmix · 1/30/2017, 11:06:33 PM

yeah agree - It feels like a system-patch type message could be good. Or at least a channel where we declare needs for update, and an easy way to see that channel.

%0KXVQicBUR1+6ZxfqD6xSrANGEaeQR3od0yzNfSdRtk=.sha256 bobhaugen · 1/30/2017, 11:46:54 PM

How many single-point-of-failure situations exist in the scuttleverse?

arj mentioned this pull request in I noticed the same problem with error messages not being very descriptive of the actual error when I was debugging gossip problems on my mac

%LG/A260YZUWD/3nRdQk7+3p0KNFMiRiDQ/5i0Es8fKs=.sha256 Dominic · 2/3/2017, 8:08:20 PM

okay, this is pretty weird - when I look at your PR in git-ssb web the commit looks like every plugin is deleted. I'm sure that isn't what this is meant to do.

http://localhost:7718/%25AHqgeMrdCYeKxB34YTG9S5Hd8b5pg4yGeLR09E0Rv4k%3D.sha256/commit/8ebaee4a7d51efc20cd4cb60c60a4b507531a6da

%pqbPWKYRri2IdVtDcp6QyeGEy4kmmzvqriYBpk3pyGQ=.sha256 cel · 2/3/2017, 8:39:03 PM

when I look at your PR in git-ssb web the commit looks like every plugin is deleted

@dominic that's probably a bug (race condition) related to %nF/4tIc0R2CQ6tKspWF1tNYbTqIC3BnsQX+aJu5udfc=.sha256

the diff is currently correctly shown here: https://git.scuttlebot.io/%25AHqgeMrdCYeKxB34YTG9S5Hd8b5pg4yGeLR09E0Rv4k%3D.sha256/commit/8ebaee4a7d51efc20cd4cb60c60a4b507531a6da

Built with git-ssb-web

Dominic / scuttlebot

Don't consider a connecting peer as inactive. On my quite slow machine I was getting lots of connect/disconnects of the same peers over and over again. Turns out that gossip is checking every 2 seconds and if you have a slow machine or a slow connect...