Internet-Draft | OpenPGP Web of Trust | December 2021 |
Walfield | Expires 4 June 2022 | [Page] |
The web of trust is a flexible, decentralized trust model created for PGP. PGP and GnuPG include implementations of the web of trust, and OpenPGP defines a number of authentication mechanisms that form the basis of both implementations.¶
Unfortunately, PGP and GnuPG implement different semantics, neither documents their semantics, and OpenPGP does not specify how a web of trust implementation should work.¶
This draft defines the semantics of the web of trust as implemented by Sequoia. Sequoia models the web of trust as a flow network, and authentication as a maximum flow problem. Although its semantics differ from both PGP's and GnuPG's semantics, in practice, it is largely compatible with both implementations.¶
By publishing this draft we hope to save developers of other OpenPGP implementations the time needed to design and specify a web of trust algorithm, and we hope to increase interoperability.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 4 June 2022.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The web of trust was designed for grass root activists who are not always willing to trust a central authority, and whose trust roots and certifications may be private. This is different from X.509, which largely assumes that there are a handful of globally trusted roots, and certifications are public.¶
Authentication in X.509 is relatively straightforward. A certificate normally includes a trust chain, which is anchored at a well-known trust root. Thus, authentication in X.509 means validating a trust chain.¶
In the web of trust, every user has their own set of roots, and certifications may not be public. So, authentication in the web of trust means building a certification network using the information that is available locally, and then finding a valid path from the user's trust roots to the binding. Since users need not unconditionally trust a certification authority, it may be necessary to find and combine multiple paths to have sufficient evidence to authenticate a binding.¶
This draft specifies a path finding algorithm for a web of trust using the mechanisms specified by [RFC4880]. Insofar as authentication mechanisms are specified by [RFC4880], they are used accordingly in this draft. [RFC4880], however, leaves many details unspecified including but not limited to: how to handle different Trust Signatures by the same issuer on multiple User IDs on the same certificate; the semantics of a Trust Signature on a third-party direct-key signature; and whether regular expressions need to match certifications of trusted introducers. This draft fills in the missing details.¶
The web of trust is a network in which the nodes are certificates, and the edges are certifications. Because a certificate may certify multiple User IDs on the same certificate, a network may include multi edges.¶
We view the network as a flow network in which an edge's capacity is the corresponding certification's trust amount. In this model, the trust amount parameter can be understood as an amount of evidence. We explicitly don't consider the trust amount to be a probability of correctness. First, humans are not good at reasoning about probability. Second, it is hard to reconcile this model with an adversary who does not make mistakes, but lies when it is to their advantage.¶
Using this model, authenticating a binding is a question of finding a flow from a set of trust roots to the binding with sufficient capacity. Unfortunately, OpenPGP certifications can impose constraints on the rest of the path. This means that most path finding algorithms cannot be used as-is. This draft describes how to use a variant of Dijkstra's shortest path algorithm to do path finding in this type of network.¶
OpenPGP provides four simple, yet powerful and flexible mechanisms to facilitate authentication. These are third-party certifications, a trust amount parameter, a trust depth parameter, and a regular expression parameter. This section describes the semantics that this specification assigns to these mechanisms.¶
This specification explicitly ignores the Signer's User ID subpacket, which is not meaningful for authentication.¶
A certification is a special type of OpenPGP Signature packet. It says that the issuer is convinced that the specified binding (User ID and certificate) is correct. When the issuer and the target certificate are the same, the certification is called a self signature or self certification. Otherwise, the certification is referred to as a third-party certification.¶
OpenPGP distinguishes four types of certifications (signature types 0x10 through 0x13). This specification treats all of these signature types identically, which reflects common practice.¶
It is possible to certify a certificate without also certifying a User ID by using a direct key signature. This specification refers to such certifications as delegations. If the trust depth parameter (described below) is non-zero, this means that the target certificate should be treated as a trusted introducer.¶
The trust amount
parameter is controlled by the Trust Signature
subpacket. It is the degree to which the issuer of a certification is
convinced that the binding is correct. This can vary from 0 to 255.
Values that are 120 or larger mean that the issuer is fully convinced.
Traditionally, an issuer uses 60 to indicate that they are partially
(aka marginally) convinced, however, any value between 1 and 119 can
be used. A value of 0 means that the target should not be considered
as certified. A certification whose trust amount is 0 should not be
ignored: it overrides earlier certifications.¶
If edges along a path have different trust amounts, then the path's trust amount is the minimum trust amount of any of the edges. Consider the following network:¶
alice | 1/60 v bob | 120 v carol¶
alice says that bob is a partially trusted (trust amount = 60
)
trusted introducer (trust depth = 1
). Even though bob has
certified carol's key with a trust amount of 120, alice only
assigns the path alice - bob - carol
a trust amount of 60.¶
This draft interprets trust amount as an amount of evidence. It
assumes that evidence is independent and can be combined linearly.
That is, if a trust root partially (trust amount < 120
) trusts two
certification authorities and they both certify a binding, the two
paths can be added together.¶
The trust depth
parameter is controlled by the Trust Signature
subpacket. It is used to indicate that a certification's target
should be considered a trusted introducer.¶
The trust depth parameter ranges from 0 to 255. A value of 0 means
that this certification is just a normal certification, and the target
is not a trusted introducer. A value of 1 means that the target is a
trusted introducer. A value of 2 means that the target is a trusted
introducer and can designate level 1 trusted introducers. In general,
a value of n
means that the target of a certification can designate
level n-1
trusted introducers. The value 255 is special and means
infinity (i.e., it does not impose a constraint).¶
If a certificate designates a level n
trusted introducer, but it is
only allowed to delegate level m
trusted introducers where m < n
,
then the trust depth parameter is limited to m
.¶
Consider the following network where the number is the certification's trust depth parameter:¶
alice | 2/120 v bob | 2/120 v carol | 2/120 v dave | 2/120 v ed¶
alice certifies bob with a trust depth of 2. This means that she considers bob to be a trusted introducer and that he can designate level 1 trusted introducers.¶
Likewise, bob certifies carol with a trust depth of 2. This means that he considers carol to be a trusted introducer and that she can designate level 1 trusted introducers.¶
From alice's perspective, however, bob's certification of carol extends too much authority to carol: she has only allowed bob to designate level 1 trusted introducers, but bob has designated carol as a level 2 trusted introducer. Instead of ignoring certifications that extend too much authority, the trust depth of any certification is capped by constraints imposed by any preceding certifications in the path. So, in this case, alice is willing to consider carol to be a level 1 trusted introducer.¶
carol certifies dave with a trust depth of 2. alice, however, only considers carol to be a level 1 trusted introducer. As with bob, carol's delegation is capped and, from alice's perspective, she is only allowed to certify other bindings. As such, alice considers dave's binding to be authenticated, but she does not consider him to be a trusted introducer.¶
Finally, dave certifies ed with a trust depth of 2. Clearly, there is
a path from alice to ed: alice - bob - carol - dave - ed
. However,
because alice does not consider dave to be a trusted introducer, this
path is not valid, and alice does not consider ed to be authenticated.¶
The regular expression parameter controls the scope of a delegation. A certification can include zero or more regular expressions. If it includes at least one regular expression, then at least one of them MUST match the User ID of the binding that is being authenticated for the path to be valid. A regular expression does not need to match intermediate trusted introducers.¶
Regular expressions are a mechanism for a user to make use of a CA in
a limited way. For instance, ed might be willing to rely on
ca@nsa.gov
to certify other nsa.gov
User IDs, but doesn't want to
rely on ca@nsa.gov
to make a statement about any other User IDs.¶
Consider the following example in which the edges are labeled with the trust depth, trust amount, and optionally a domain, which corresponds to a regular expression that matches email addresses with that domain:¶
ed@lavabit.com | 255/120/nsa.gov v ca@nsa.gov | 1/120 v ca@fbi.gov | 0/120 v paul@nsa.gov¶
ed considers ca@nsa.gov
to be a fully trusted (trust amount = 120
)
trusted introducer (trust depth 255
) for User IDs that are in
nsa.gov
. ca@nsa.gov
delegates to ca@fbi.gov
, which has
certified paul@nsa.gov
. Even though the regular expression doesn't
match the ca@fbi.gov
, it does match the target User ID
(paul@nsa.gov
) so ed can authenticate paul@nsa.gov
.¶
A User ID identifies an entity. Because an entity may have multiple aliases or roles, it is reasonable and possible for a certificate to have multiple valid User IDs.¶
A certification's trustworthiness depends not on an identity, but on the entity. If an entity acts in conflicting ways depending on their role, then this draft takes the position that either they should not be trusted, or they should have multiple certificates.¶
Authenticating a binding is a two-phase process. First, a network is built. Then, one or more paths starting at the trust roots and ending at the binding are located in the network.¶
A web of trust network is built with respect to a reference time as follows:¶
A node is created for each non-revoked live certificate.¶
A directed edge from the issuer to the target certificate is created for each non-revoked live certification and non-revoked live delegation.¶
Edges are labeled with their certification's or delegation's parameters. In particular, edges are labelled with the trust amount, the trust depth, and any regular expressions.¶
To authenticate a binding, it is necessary to find one or more valid paths from the roots to the binding in the network.¶
A path is valid if it starts from a trust root, ends at the target certificate, the last edge is a certification over the target User ID, all certificates, certifications, and the target User ID are in scope (that is, any trust depth parameters are respected, and for each edge that has regular expressions, at last one regular expression matches the target User ID), and the target User ID is not revoked.¶
Note: a self certification counts as an edge and thus is only in scope if the certificate is a trusted introducer.¶
A path SHOULD be minimal in the sense that it should not have any cycles.¶
A path's trust amount is the minimum trust amount of the trust amount of each edge in the path.¶
Multiple paths can be combined if they use the same edge in any multi-edges. The trust amount of multiple paths is the maximum flow of the network induced by the paths.¶
A binding is fully authenticated if the trust amount of the valid paths is at least 120. It is partially authenticated if the trust amount is between 1 and 119.¶
The following text is non-normative. It motivates and describes one possible implementation strategy, which satisfies the above constraints. An implementation is free to implement this draft as it sees fit.¶
A simple algorithm to find the shortest path in a network is to enumerate all valid paths from the roots to the binding, and then select the best path. This algorithm is in NP (there are an exponential number of paths) however, and is thus only tractable for small networks.¶
Path finding algorithms like Dijkstra's shortest path algorithm are
more efficient. Dijkstra's algorithm computes a shortest-path tree
(the shortest distance from one node to every other node in the
network) while visiting each node and each edge at most once. Its run
time is O((N + E) * log(N))
where N
is the number of nodes and E
is the number of edges. In practice, this is fast even for large,
highly connected graphs.¶
Unfortunately, Dijkstra's algorithm cannot be used as is. Dijkstra's algorithm assumes that edges do not impose constraints on the rest of the path. This is typically the case for a network of cities and roads. But, edges in a web of trust may have a finite trust depth, which may render some of the paths they are on invalid, and they may include regular expressions, which the target User ID has to match.¶
Let's say that we are applying Dijkstra's algorithm to a network that looks like this:¶
root | v ... | | v v s t 2/120 \ / 3/60 v v u | v ... | v target¶
Say we are considering the edge t - u
, and u
's current back
pointer is s - u
. At this point, we have to decide if we prefer the
edge s - u
, which has a trust depth of 2 and a trust amount of 120,
or the edge t - u
, which has a trust depth of 3 and a trust amount
of 60. We need to get this decision right now. As explained above,
Dijkstra's algorithm only visits each edge once, so we won't have a
chance to try the alternative later.¶
Unfortunately, neither s - u
nor t - u
is strictly better than the
other. s - u
has a larger trust amount, but t - u
has a higher
trust depth.¶
Let's assume that we choose s - u
, the edge with the higher trust
amount. As we continue to apply Dijkstra's algorithm, we might find
that the only paths to the target are too long for s - u
's trust
depth. But now it is too late; we can't go back and revise our
decision. More importantly, we can't even be sure that there is a
valid path.¶
With a few tricks, however, we can still use Dijkstra's algorithm.¶
First, we need to limit the search from finding a shortest-path tree to finding a shortest path from a root to the target binding. Then we can easily satisfy any regular expression constraints by simply ignoring edges that have regular expressions that don't match the target User ID.¶
Second, as shown above, a cost function that prefers edges with a higher trust amount does not always return a path when there is one. But, we can construct a cost function that always returns a path if there is one, and then use the Ford Fulkerson algorithm to find a maximum flow. (The Ford Fulkerson algorithm finds a path, computes a residual network by subtracting that path, and then loops until no paths remain.) In some situations, this may mean that we have more paths than strictly necessary. However, because we have to deal with multiple paths anyway as there is not always a single path that can authenticate a binding, this doesn't actually increase the complexity.¶
We can actually do better than this. Through the use of a priority queue, Dijkstra's algorithm ensures that when a node is visited, the optimal path to that node is known. Thus, we know the constraints that a path will impose on the following node, and we can use that information to select the best edge.¶
Consider again the above network. If the path leading to t
constrains t
to be a level 3 trusted introducer, then it doesn't
matter that t
certifies u
to be a level 3 trusted introducer: the
previous path limits t
's certification of u
to be at most a level
2 trusted introducer. Thus, we can safely prefer the edge s - u
,
since it has the same effective trust depth.¶
In fact, this isn't an optimization; we have to consider any path
constraints. Otherwise, we may not find a path when there is a valid
path. Imagine now that the path leading to t
constraints t
to be
a level 2 trusted introducer. In this case, the edge s - u
is
strictly better (it's effective trust depth is greater), and
preferring it may be necessary to find a valid path to the target.¶
We recommend running the algorithm backwards, i.e., from the source towards the roots. We refer to this as backwards propagation. This has the advantage that we often don't have to explore as much of the network. Concretely, if the network is divided into multiple components, then only the component with the target needs to be explored. This is more often the case when working backwards, because we don't have to consider any paths via a root. Consider the following network:¶
root / \ v v left right / \ / \ v v v v ... ... ... ...¶
When running the algorithm forwards we start at the root and we need to explore the whole network. But when running the algorithm backwards we only need to explore the left side or the right side (and often less) as the root does not not connect the two sides.¶
When using backwards propagation, we use the following cost function: given two path suffixes, we prefer the path suffix that is shorter. This guarantees that if there is a valid path, we will find it. If the two path suffixes are the same length, we prefer the one with the higher trust amount.¶
When using backwards propagation, we sometimes come up with a better solution than when using forward propagation. Consider the following network:¶
root 255/120 | v a 255/1 / \ 2/120 v v b c 255/120 \ / 1/120 v v d | 120 v target¶
When using forward propagation (i.e., starting at root and working
towards the target), we set d
's backpointer to b
, because that
path prefix is less constrained (via b
the trust depth is
unconstrained, but via c
, d
is only a level 1 trusted introducer).
This means that we would find the path root - a - b - d - target
,
which has a trust amount of 1.¶
Using backwards propagation (i.e., reversing the edges, starting at
the target, and working towards the root), when visiting a
, we would
see that both possible path suffixes are valid, and the paths are the
same length, so we'd choose the path via c
, because its trust amount
is higher. Thus, backwards propagation would find root - a - c - d -
target
, which has a trust amount of 120.¶
But, forward propagation would perform better on this network:¶
root 3/120 | v a 2/120 / \ v | b | 1/60 1/120 \ / v v c | 120 v target¶
Finally, when using backwards propagation, we recommend not stopping when we visit a root. This is because our cost function does not actually optimize for what we really want to optimize for: we are interested in the valid path with the highest trust amount, but the cost function optimizes for the shortest, valid path. By not stopping when we reach a root, we open up the possibility that we find a longer path with a higher trust amount.¶
Consider the following web of trust:¶
alice | 2/100 v bob 255/120 / \ v ` carol | 255/120 | | 0/30 v | dave , 0/120 \ / v v ed¶
Let's walk through authenticating ed with alice as the sole trusted root using the algorithm described above.¶
Dijkstra's algorithm maintains two data structures: a priority queue of nodes that have not yet been visited ordered by their cost (best first); and, a list of back pointers. (Since we are reversing the direction, the back pointers are actually forward pointers in the original network, and that's how we name the variable below.) Initially the priority queue consists of the target.¶
queue = [ (ed; 0; 120) ]; forward_pointers = [ ];¶
Each node in the queue includes the cost of the path suffix starting at that node. The cost is the path suffix's length and its trust amount. These values may be updated while the node is in the queue, but once the node is visited, they won't be updated further; at that point we've found the optimal path from that node to the target.¶
We start with ed, and consider each certification made on ed: dave -
ed
and bob - ed
.¶
Say we start with dave - ed
(the order doesn't matter). Since dave
doesn't yet have a forward pointer, we set his forward pointer to ed
and add dave to the queue. Then we consider bob - ed
. Since bob
also doesn't have a forward pointer, we also just set his forward
pointer to ed, and we add him to the queue.¶
queue = [ (dave; 1; 120), (bob; 1; 30) ]; forward_pointers = [ (bob -> ed), (dave -> ed) ];¶
Next, we pop the certificate with the best path suffix from the queue. Because bob's and dave's current paths are the same length (1), we compare the trust amount of each path suffix. dave's trust amount is 120 whereas bob's is only 30. So, we pop dave.¶
dave is only certified by carol. Looking at carol, we see that she doesn't yet have a forward pointer so we set her forward pointer to dave, and we add carol to the queue.¶
queue = [ (bob; 1; 30), (carol; 2; 120) ]; forward_pointers = [ bob -> ed; carol -> dave; dave -> ed ];¶
The queue now contains bob and carol. We prefer bob, because his current path is shorter (1 vs 2).¶
bob is certified by alice. Since alice's forward pointer is empty, we set it to point to bob. We don't add alice to the queue, because alice is a root, and we don't consider paths via alice. And, as described above, although we would have a valid path when we visit alice, there may be a path with a higher trust amount, but is longer.¶
queue = [ (carol; 2; 120) ]; forward_pointers = [ alice -> bob; bob -> ed; carol -> dave; dave -> ed ];¶
We now pop carol from the queue.¶
carol is certified by bob. We compare bob's current path to the one via carol.¶
bob - carol
+ carol's current path: length: 3, amount: 120¶
We prefer the existing forward pointer because the path is shorter
even though the amount of trust is smaller. If we had taken the
longer path, then any forward pointers pointing to bob might become
invalid. In fact, that is the case here: the edge alice - bob
has a
trust depth of 2, which means that alice - bob - carol - dave - ed
is not valid. Thus, because we never replace an existing forward
pointer with a forward pointer with a longer path, all forward
pointers remain--by construction--valid.¶
We don't add bob to the queue, because bob has already been visited.¶
queue = [ ]; forward_pointers = [ alice -> bob; bob -> ed; carol -> dave; dave -> ed ];¶
Since the queue is empty, we must have visited every node reachable
from ed. Now we just need to extract the path, which we do by looking
at the forward pointers: the best path is alice - bob - ed
.¶
A Rust implementation of this specification is part of Sequoia. See https://gitlab.com/sequoia-pgp/sequoia-wot for the source code.¶
In practice, this algorithm is able to solve the web of trust problem within milliseconds even for large networks that include large cliques.¶
This specification assumes that certifications by different certificates are independent. This only holds if an entity has at most a single certificate. But, there are legitimate reasons for this not to be the case. For instance, a user may create a new certificate using newer algorithms, and not revoke their old certificate so that they can continue to communicate with people who use software that can only handle the older certificate.¶
This can result in the following scenario:¶
alice 1/60 / \ 1/60 v v bob 1 bob 2 120 \ / 120 v v carol¶
Bob has two certificates and Alice certifies both of them as partially trusted introducers. Now any binding that bob signs with both certificates will be fully trusted by alice. This was not Alice's intent.¶
Similarly, certifications that use similar verification methods are not actually independent. Consider two Let's Encrypt-like CAs for OpenPGP certificates. If both verify bindings using an email challenge, then the security of both largely relies on the same mechanism. Conceptually, it still makes sense to combine them if the CAs are in different trust domains, but the trust amounts should probably not simply be added together.¶
One way to improve this situation would be to introduce a set of notations that allow the signer to indicate in a machine-readable way how a binding was verified. Then software could place limits on different types of authentication mechanisms, and better control how they combine.¶
[ RFC Editor: please remove this section before publication ]¶
This document is currently edited as markdown. Minor editorial changes can be suggested via merge requests at https://gitlab.com/sequoia-pgp/sequoia-wot or by e-mail to the authors. Please direct all significant commentary to the public IETF OpenPGP mailing list: openpgp@ietf.org¶
This is a first draft that has not been published.¶
My thanks go--in particular, but not only--to Justus Winter, Daniel Kahn Gillmor, and Heiko Schaefer for many fruitful discussions about trust models, authentication, and OpenPGP.¶