git ssb

9+

mixmix / ssb-handbook



Tree: 77357780d01916a46926073e9510695a12a5c571

Files: 77357780d01916a46926073e9510695a12a5c571 / stories / using-trust-in-open-networks.md

14634 bytesRaw

Using Trust in Open Networks

Open networks are systems that allow participation without deferring to central authority. The Web, Email, BitTorrent, IPFS, and SecureScuttlebutt are all open networks.

Open networks try to maximise their potential value by letting agents contribute and extract data independently, and without intermediation. This is the Laissez-faire approach to architecture, and it frequently benefits from the efficiency of horizontal scaling.

In SecureScuttlebutt, we have decided never to adopt centralized authorities in the protocol. This decision was motivated by

  1. distrust of information monocultures
  2. desire to maximize autonomy of individual users
  3. general curiosity

This constraint pushes us to find novel solutions for shared datastructures. This article will summarize literature on trust-based mechanisms, and offer some perspective to how they can be applied to SecureScuttlebutt.


Trust-ranking is a key feature of open networks with shared data-structures.
As each agent receives new information, it must decide how to act on it, and decide whether to discard the input. Spam, resource-leaching, and DoS attacks would otherwise overwhelm the agents.

Email is an open network: anybody can create a server and account. And, the inbox a shared structure: users share append-rights over each others' inboxes. But, without proper spam-filtering, email is nearly useless. Filtering must be both effective (very little spam) and accurate (very few false positives) or email loses its utility.

A survey of trust in computer science and the Semantic Web names two primary means of managing trust.

Policy-based trust. Using policies to establish trust, focused on managing and exchanging credentials and enforcing access policies. Work in policy-based trust generally assumes that trust is established simply by obtaining a sufficient amount of credentials.

Reputation-based trust. Using reputation to establish trust, where past interactions or performance for an entity are combined to assess its future behavior. Research in reputation-based trust uses the history of an entity's actions/behavior to compute trust, and may use referral-based trust (information from others) in the absence of (or in addition to) first-hand knowledge.

SecureScuttlebutt uses social "following" relations as a base trust-policy. Users choose which feeds to follow, and therefore opt into every messaging partner. This is how SSB controls spam.

Other trust-policies could be leveraged by SSB applications, to give users authority over other shared data-structures. For instance, users could assign friends the right to follow more users on their behalf, or give moderator powers to enable post-hiding.

However, policies scale poorly, as they require the user to make a decision about every other agent in the network.
This makes it hard for a user to evaluate information produced by "strangers", even if the stranger is only one or two social hops away. To solve this, reputation-based trust can be introduced to build upon the user's decisions, by analyzing the decisions of other agents, and assigning authority automatically.


To understand reputation-based trust, let's look at a common application of them, search engines. PageRank is still one of the top search algorithms in use, so it's important to understand where it fits in the current landscape of research.

Propagation Models for Trust and Distrust in Social Networks does a good job explaining PageRank's qualities, in section 2. From there, and from other papers:

These lessons can easily apply when writing graph-analysis algorithms for SecureScuttlebutt. Inferring the wrong properties about the input data, such as edge-transitivity, can lead the algorithm to draw wrong conclusions. As the purpose of reputation is to automate trust decisions, it's important not to make those mistakes.

The data model is also very important to reputation-based trust, and is frequently explored in the literature I found. There is no consensus on which model is best; they vary by application and intent.

The EigenTrust Algorithm for Reputation Management in P2P Networks applies probabilistic analysis on a dataset of rated transactions. After every file-download, users publish whether the downloaded file was as-advertised. The ratings are accumulated and normalized to quantify the trustworthiness of peers. The authors assume that trusted peers are likely to make good trust decisions as well, and so they use the trust-values transitively to fill holes in the dataset. At a glance, the XRep algorithm seems to work in roughly the same way.

Eigentrust is not a social system; it's a p2p mesh of file-hosts, which abstracts the hosts away from the user's awareness. Their data model, therefore, doesn't use direct trust-ratings between agents, whereas SSB can. In contrast, the TrustMail algorithm sets ratings directly on agents.

Trust Management for the Semantic Web splits the model into "belief ratings for statements" and "trust ratings for agents"

[We] propose a solution to the problem of establishing the degree of belief in a statement that is explicitly asserted by one or more sources on the Semantic Web. ... Our basic model is that a user's belief in a statement should be a function of her trust in the sources providing it.

They offer both path-algebra and probabilistic interpretations for the graph, and a number of options for the formulas.

Modeling and Evaluating Trust Network Inference describes an even more sophisticated model. First, they isolate trust into domains, following the intuition that an agent may be an expert in one field, but not in another. Then, in section 2.4.2, they define 2 categories of trust, and subdivide them into 5 types. The categories are referral trust ("reflects an agent's estimation about the quality of the other agents' knowledge") and associative trust ("reflects the similarity between two agents"). The subdivisions are as follows:

This is not the first group to separate DET and RET. The Hilltop and HITS algorithms make that distinction as well.

The values used to rate agents or data varies between the algorithms. Roughly, they tend to be trinary (trusted/distrusted/no-opinion) or real (0..1, or 0..10, etc). I've found nothing definitive about which is best.

When choosing the model, we should remember that reputation systems exist to rank information in the absense of local input. If the user were able to rate all inputs a-priori, a policy system would suffice. Since the user can not, a reputation system merges the inputs of many users to fill holes in the data structure.

If we want to entrust function-critical decisions to shared data, we need to be robust to malicious behaviors. The Appleseed Paper provides useful analysis of this:

The "bottleneck property" [is] common feature of attack-resistant trust metrics. Informally, this property states that the "trust quantity accorded to an edge s -> t is not significantly affected by changes to the successors of t."

Put another way, the bottleneck property holds if a voting ring is not able to elevate its reputation the network by recommending each other. This property requires subjectivity: trust must flow outward from a seed set. Fortunately, SecureScuttlebutt is subjective by nature.

The other concern we must address is compromise-risk: we should minimize the danger of a trusted node acting maliciously. Two techniques for handling this:


A few notes on how computation occurs. In addition to the path-algebra vs probabilistic analysis, there's a split among the papers between making distributed vs centralized/localized computations.

Distributed computation uses realtime queries to peers for trust information. Centralized/localized computation aggregates information at a node, and then computes across that aggregated information.

Both Google and SecureScuttlebutt qualify as centralized/localized. The difference is that Google tries to obtain global knowledge of the network, via crawling, while SSB obtains a localized neighborhood of the network.

The community is encouraged to supplement SecureScuttlebutt's localized computation with distributed protocols, but it's not currently expected that SSB core will include any distributed computation systems itsemf, due to the scope creep that would create.


It may be possible to generalize a universal system for ranking the trust and quality of data in SecureScuttlebutt. If so, then a single "ranking" application would be able to publish and review data structures. However, I'm unsure that's possible. In the near-term, I suggest we study applications and data-structures separately, and watch for unifying principles.

There are a few applications which I think we should focus our attention on, because they'll yield the most insight, and value, to the network.

-Paul F

Built with git-ssb-web