In designing any system composed of multiple interconnected services, a key consideration is to ensure that the data that is sent between these services is trusted.

While industry best practices already exist defining how to secure connections between entities, there is significant additional complexity when attempting to scale these designs to secure thousands of agents that exist in a potentially hostile network environment, while keeping the operational burden to a minimum.

When designing the Jazz Platform we had some key design criteria:

  • All connections between components must be encrypted and have perfect forward secrecy to ensure anything intercepting the traffic cannot access the potentially sensitive content.
  • Untrusted hosts must not be able to establish a connection with any part of the system.
  • Entities connecting to each other must be able to cryptographically assert that the remote host cannot present a spoofed identity.
  • In case a host is compromised it must be possible to revoke access to the system.
  • If a host is compromised the attacker must not be able to use the credentials to impersonate another component and assume a different role.
  • The core services of the Jazz Infrastructure are protected from denial of service by untrusted hosts.
  • The process of enrolling agents to a deployment should be as simple as possible to reduce the maintenance burden for the administrator.
  • The scheme must be designed to scale to arbitrarily large deployments. Enrolling 10,000 agents should be just as easy as enrolling 10.
  • Once enrolled, an agent must require no maintenance and be able to automatically renew its credentials.

By using a hardened TLS stack for the agent connection we are able to ensure the transport is encrypted, trusted and safe from man in the middle attacks. For this post I’m going to focus on the other criteria, establishing, renewing and revoking trust, and the unique way we solve this in the Jazz Platform.

Traditional approaches

Enrollment is the process of establishing a mutual trust relationship between the client and the server, or more specifically, provisioning and installing a set of certificates.

In traditional Public Key Infrastructure (PKI) this is done by generating a Certificate Signing Request (CSR) on the host requesting a certificate for its name. This is then sent (usually by email or uploading to a website) to a certificate authority. The authority then performs some form of validation that the requester owns the domain and will then issue a certificate, a highly manual process. While some schemes do exist for automatically issuing certificates, such as the Automatic Certificate Management Environment (ACME) protocol (made popular by Let’s Encrypt), this system is still fundamentally based on DNS names, which is not viable for most corporate networks, where individual users laptops do not have a registered name, and is made more complex as users connect and work remotely.

The difficulties of secure agent enrollment are not new. Many systems for enrolling hosts require an administrator to manually distribute certificate requests, certificates and sometimes even private keys, a slow and time consuming process to do properly, leading to users cutting corners by storing important credentials on shared drives.

Other more automated systems allow clients to request enrollment which must be then approved by the administrator. While this reduces the administrative burden, it is hard to protect the system from abuse, and is difficult to provide the administrator context to allow them to decide if enrollment requests are legitimate.

Jazz Agent enrollment

The Jazz Agent enrollment scheme is based around tokens which are granted by the server and allow agents to request a certificate. To initially enroll an agent we generate a single use token which is included in an enrollment bundle. When installing the agent we provide it the enrollment bundle, this bundle contains the token and some additional configuration data that the agent can use to bootstrap its connection to the server.

The agent generates a new CSR and sends it to the server with its enrollment token, the server validates the token and ensures it hasn’t been used before. If the token is valid the server generates a unique identifier for the agent (the Agent UUID) and can issue a certificate with this ID. In addition to the certificate it also issues a new enrollment token which can later be used by the agent when it needs to renew its certificate.

The enrollment token system also provides a simple way to extend the process of enrolling agents to create complex deployment scenarios. Arbitrary properties can be attached to tokens, such as a cluster identifier which can be used to attach policy to agents as they enroll.

Token security

Enrollment tokens are cryptographically verifiable by the server, so they cannot be forged by a malicious party without the private key held securely within the server.

The enrollment tokens themselves effectively provide an agent with access to the Jazz Infrastructure, with a token an attacker could request a certificate and send data to the server; so it is important that these tokens are kept securely. In the case of a token accidentally being disclosed there are various protection mechanisms that help restrict the scope of any disclosure:

  • Tokens can be revoked centrally, meaning that if it is known a token has been lost it can be immediately blocked from being used. There is no need to reprovision any agents or infrastructure components.
  • Similarly individual agent certificates can be revoked, in case the lost bundle is used to provision an agent it access to the system can be similarly revoked.
  • Importantly because a token on its own does not identify the agent, it is not possible to impersonate another agent by gaining access to an enrollment bundle.

Due to the distributed nature of the Jazz Platform, it is important for each component to be able to authenticate connections from agents in order to authorize them to perform certain actions (such as sending event data to the server). Conversely management and creation of certificates is better handled centrally such that there is a single isolated, secure and audited authority for the whole system. Unlike certificates, tokens cannot be used for authentication, they only grant the right to request a certificate. By decoupling these two responsibilities and using a token system to issue certificates we have enabled a secure and scalable system for enrolling agents.

This dive into the design and internals of our agent enrollment process has shown the process and thinking that goes into ensuring the security and integrity of the Jazz Platform and data it collects, while minimizing the administrative burden as the deployment grows from 10 agents to 10,000.

Sources:

  • IETF Tools, “Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile”, May 2008
  • Let’s Encrypt, “How It Works”