Which protocol involves a three way handshake and uses a shared secret between the two entities for communication group of answer choices?

Deploying your application over TLS will require some additional work, both within your application (e.g. migrating resources to HTTPS to avoid mixed content), and on the configuration of the infrastructure responsible for delivering the application data over TLS. A well tuned deployment can make an enormous positive difference in the observed performance, user experience, and overall operational costs. Let’s dive in.

Establishing and maintaining an encrypted channel introduces additional computational costs for both peers. Specifically, first there is the asymmetric (public key) encryption used during the TLS handshake (explained TLS Handshake). Then, once a shared secret is established, it is used as a symmetric key to encrypt all TLS records.

As we noted earlier, public key cryptography is more computationally expensive when compared with symmetric key cryptography, and in the early days of the Web often required additional hardware to perform "SSL offloading." The good news is, this is no longer necessary and what once required dedicated hardware can now be done directly on the CPU. Large organizations such as Facebook, Twitter, and Google, which offer TLS to billions of users, perform all the necessary TLS negotiation and computation in software and on commodity hardware.

In January this year (2010), Gmail switched to using HTTPS for everything by default. Previously it had been introduced as an option, but now all of our users use HTTPS to secure their email between their browsers and Google, all the time. In order to do this we had to deploy no additional machines and no special hardware. On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10 KB of memory per connection and less than 2% of network overhead. Many people believe that SSL/TLS takes a lot of CPU time and we hope the preceding numbers (public for the first time) will help to dispel that.

If you stop reading now you only need to remember one thing: SSL/TLS is not computationally expensive anymore.

Adam Langley (Google)

We have deployed TLS at a large scale using both hardware and software load balancers. We have found that modern software-based TLS implementations running on commodity CPUs are fast enough to handle heavy HTTPS traffic load without needing to resort to dedicated cryptographic hardware. We serve all of our HTTPS traffic using software running on commodity hardware.

Doug Beaver (Facebook)

Elliptic Curve Diffie-Hellman (ECDHE) is only a little more expensive than RSA for an equivalent security level… In practical deployment, we found that enabling and prioritizing ECDHE cipher suites actually caused negligible increase in CPU usage. HTTP keepalives and session resumption mean that most requests do not require a full handshake, so handshake operations do not dominate our CPU usage. We find 75% of Twitter’s client requests are sent over connections established using ECDHE. The remaining 25% consists mostly of older clients that don’t yet support the ECDHE cipher suites.

Jacob Hoffman-Andrews (Twitter)

To get the best results in your own deployments, make the best of TLS Session Resumption—deploy, measure, and optimize its success rate. Eliminating the need to perform the costly public key cryptography operations on every handshake will significantly reduce both the computational and latency costs of TLS; there is no reason to spend CPU cycles on work that you don’t need to do.

Speaking of optimizing CPU cycles, make sure to keep your servers up to date with the latest version of the TLS libraries! In addition to the security improvements, you will also often see performance benefits. Security and performance go hand-in-hand.

An unoptimized TLS deployment can easily add many additional roundtrips and introduce significant latency for the user—e.g. multi-RTT handshakes, slow and ineffective certificate revocation checks, large TLS records that require multiple roundtrips, and so on. Don’t be that site, you can do much better.

A well-tuned TLS deployment should add at most one extra roundtrip for negotiating the TLS connection, regardless of whether it is new or resumed, and avoid all other latency pitfalls: configure session resumption, and enable forward secrecy to enable TLS False Start.

To get the best end-to-end performance, make sure to audit both own and third-party services and servers used by your application, including your CDN provider. For a quick report-card overview of popular servers and CDNs, check out istlsfastyet.com.

The best way to minimize both latency and computational overhead of setting up new TCP+TLS connections is to optimize connection reuse. Doing so amortizes the setup costs across requests and delivers a much faster experience to the user.

Verify that your server and proxy configurations are setup to allow keepalive connections, and audit your connection timeout settings. Many popular servers set aggressive connection timeouts (e.g. some Apache versions default to 5s timeouts) that force a lot of unnecessary renegotiations. For best results, use your logs and analytics to determine the optimal timeout values.

As we discussed in Primer on Latency and Bandwidth, we may not be able to make our packets travel faster, but we can make them travel a shorter distance. By placing our "edge" servers closer to the user (Figure 4-9), we can significantly reduce the roundtrip times and the total costs of the TCP and TLS handshakes.

Which protocol involves a three way handshake and uses a shared secret between the two entities for communication group of answer choices?
Figure 4-9. Early termination of client connections

A simple way to accomplish this is to leverage the services of a content delivery network (CDN) that maintains pools of edge servers around the globe, or to deploy your own. By allowing the user to terminate their connection with a nearby server, instead of traversing across oceans and continental links to your origin, the client gets the benefit of "early termination" with shorter roundtrips. This technique is equally useful and important for static and dynamic content: static content can also be cached and served by the edge servers, whereas dynamic requests can be routed over established connections from the edge to origin.

The technique of using a CDN or a proxy server to fetch a resource, which may need to be customized per user or contains other private data, and hence is not a globally cacheable resource at the edge, is commonly known as an "uncached origin fetch."

While CDNs work best when the data is cached in geo-distributed servers around the world, the uncached origin fetch still provides a very important optimization: the client connection is terminated with the nearby server, which can dramatically reduce the handshake latency costs. In turn, the CDN, or your own proxy server, can maintain a "warm connection pool" to relay the data to the origin servers, allowing you to return a fast response back to the client.

In fact, as an additional layer of optimization, some CDN providers will use nearby servers on both sides of the connection! The client connection is terminated at a nearby CDN node, which then relays the request to the CDN node close to the origin, and the request is then routed to the origin. The hop within the CDN network allows the traffic to be routed over the optimized CDN backbone, which can help to further reduce latency between client and origin servers.

Terminating the connection closer to the user is an optimization that will help decrease latency for your users in all cases, but once again, no bit is faster than a bit not sent—send fewer bits. Enabling TLS session caching and stateless resumption allows us to eliminate an entire roundtrip of latency and reduce computational overhead for repeat visitors.

Session identifiers, on which TLS session caching relies, were introduced in SSL 2.0 and have wide support among most clients and servers. However, if you are configuring TLS on your server, do not assume that session support will be on by default. In fact, it is more common to have it off on most servers by default—but you know better! Double-check and verify your server, proxy, and CDN configuration:

  • Servers with multiple processes or workers should use a shared session cache.

  • Size of the shared session cache should be tuned to your levels of traffic.

  • A session timeout period should be provided.

  • In a multi-server setup, routing the same client IP, or the same TLS session ID, to the same server is one way to provide good session cache utilization.

  • Where "sticky" load balancing is not an option, a shared cache should be used between different servers to provide good session cache utilization, and a secure mechanism needs to be established to share and update the secret keys to decrypt the provided session tickets.

  • Check and monitor your TLS session cache statistics for best performance.

In practice, and for best results, you should configure both session caching and session ticket mechanisms. These mechanisms work together to provide best coverage both for new and older clients.

Session resumption provides two important benefits: it eliminates an extra handshake roundtrip for returning visitors and reduces the computational cost of the handshake by allowing reuse of previously negotiated session parameters. However, it does not help in cases where the visitor is communicating with the server for the first time, or if the previous session has expired.

To get the best of both worlds—a one roundtrip handshake for new and repeat visitors, and computational savings for repeat visitors—we can use TLS False Start, which is an optional protocol extension that allows the sender to send application data (Figure 4-10) when the handshake is only partially complete.

Which protocol involves a three way handshake and uses a shared secret between the two entities for communication group of answer choices?
Figure 4-10. TLS handshake with False Start

False Start does not modify the TLS handshake protocol, rather it only affects the protocol timing of when the application data can be sent. Intuitively, once the client has sent the ClientKeyExchange record, it already knows the encryption key and can begin transmitting application data—the rest of the handshake is spent confirming that nobody has tampered with the handshake records, and can be done in parallel. As a result, False Start allows us to keep the TLS handshake at one roundtrip regardless of whether we are performing a full or abbreviated handshake.

Because False Start is only modifying the timing of the handshake protocol, it does not require any updates to the TLS protocol itself and can be implemented unilaterally—i.e., the client can simply begin transmitting encrypted application data sooner. Well, that’s the theory.

In practice, even though TLS False Start should be backwards compatible with all existing TLS clients and servers, enabling it by default for all TLS connections proved to be problematic due to some poorly implemented servers. As a result, all modern browsers are capable of using TLS False Start, but will only do so when certain conditions are met by the server:

  • Chrome and Firefox require an ALPN protocol advertisement to be present in the server handshake, and that the cipher suite chosen by the server enables forward secrecy.

  • Safari requires that the cipher suite chosen by the server enables forward secrecy.

  • Internet Explorer uses a combination of a blacklist of known sites that break when TLS False Start is enabled, and a timeout to repeat the handshake if the TLS False Start handshake failed.

To enable TLS False Start across all browsers the server should advertise a list of supported protocols via the ALPN extension—e.g., "h2, http/1.1"—and be configured to support and prefer cipher suites that enable forward secrecy.

All application data delivered via TLS is transported within a record protocol (Figure 4-8). The maximum size of each record is 16 KB, and depending on the chosen cipher, each record will add anywhere from 20 to 40 bytes of overhead for the header, MAC, and optional padding. If the record then fits into a single TCP packet, then we also have to add the IP and TCP overhead: 20-byte header for IP, and 20-byte header for TCP with no options. As a result, there is potential for 60 to 100 bytes of overhead for each record. For a typical maximum transmission unit (MTU) size of 1,500 bytes on the wire, this packet structure translates to a minimum of 6% of framing overhead.

The smaller the record, the higher the framing overhead. However, simply increasing the size of the record to its maximum size (16 KB) is not necessarily a good idea. If the record spans multiple TCP packets, then the TLS layer must wait for all the TCP packets to arrive before it can decrypt the data (Figure 4-11). If any of those TCP packets get lost, reordered, or throttled due to congestion control, then the individual fragments of the TLS record will have to be buffered before they can be decoded, resulting in additional latency. In practice, these delays can create significant bottlenecks for the browser, which prefers to consume data in a streaming fashion.

Which protocol involves a three way handshake and uses a shared secret between the two entities for communication group of answer choices?
Figure 4-11. WireShark capture of 11,211-byte TLS record split over 8 TCP segments

Small records incur overhead, large records incur latency, and there is no one value for the "optimal" record size. Instead, for web applications, which are consumed by the browser, the best strategy is to dynamically adjust the record size based on the state of the TCP connection:

  • When the connection is new and TCP congestion window is low, or when the connection has been idle for some time (see Slow-Start Restart), each TCP packet should carry exactly one TLS record, and the TLS record should occupy the full maximum segment size (MSS) allocated by TCP.

  • When the connection congestion window is large and if we are transferring a large stream (e.g., streaming video), the size of the TLS record can be increased to span multiple TCP packets (up to 16KB) to reduce framing and CPU overhead on the client and server.

If the TCP connection has been idle, and even if Slow-Start Restart is disabled on the server, the best strategy is to decrease the record size when sending a new burst of data: the conditions may have changed since last transmission, and our goal is to minimize the probability of buffering at the application layer due to lost packets, reordering, and retransmissions.

Using a dynamic strategy delivers the best performance for interactive traffic: small record size eliminates unnecessary buffering latency and improves the time-to-first-{HTML byte, …, video frame}, and a larger record size optimizes throughput by minimizing the overhead of TLS for long-lived streams.

To determine the optimal record size for each state let’s start with the initial case of a new or idle TCP connection where we want to avoid TLS records from spanning multiple TCP packets:

  • Allocate 20 bytes for IPv4 framing overhead and 40 bytes for IPv6.

  • Allocate 20 bytes for TCP framing overhead.

  • Allocate 40 bytes for TCP options overhead (timestamps, SACKs).

Assuming a common 1,500-byte starting MTU, this leaves 1,420 bytes for a TLS record delivered over IPv4, and 1,400 bytes for IPv6. To be future-proof, use the IPv6 size, which leaves us with 1,400 bytes for each TLS record, and adjust as needed if your MTU is lower.

Next, the decision as to when the record size should be increased and reset if the connection has been idle, can be set based on pre-configured thresholds: increase record size to up to 16 KB after X KB of data have been transferred, and reset the record size after Y milliseconds of idle time.

Typically, configuring the TLS record size is not something we can control at the application layer. Instead, often this is a setting and sometimes a compile-time constant for your TLS server. Check the documentation of your server for details on how to configure these values.

As of early 2014, Google’s servers use small TLS records that fit into a single TCP segment for the first 1 MB of data, increase record size to 16 KB after that to optimize throughput, and then reset record size back to a single segment after one second of inactivity—lather, rinse, repeat.

Similarly, if your servers are handling a large number of TLS connections, then minimizing memory usage per connection can be a vital optimization. By default, popular libraries such as OpenSSL will allocate up to 50 KB of memory per connection, but as with the record size, it may be worth checking the documentation or the source code for how to adjust this value. Google’s servers reduce their OpenSSL buffers down to about 5 KB.

Verifying the chain of trust requires that the browser traverse the chain, starting from the site certificate, and recursively verify the certificate of the parent until it reaches a trusted root. Hence, it is critical that the provided chain includes all the intermediate certificates. If any are omitted, the browser will be forced to pause the verification process and fetch the missing certificates, adding additional DNS lookups, TCP handshakes, and HTTP requests into the process.

How does the browser know from where to fetch the missing certificates? Each child certificate typically contains a URL for the parent. If the URL is omitted and the required certificate is not included, then the verification will fail.

Conversely, do not include unnecessary certificates, such as the trusted roots in your certificate chain—they add unnecessary bytes. Recall that the server certificate chain is sent as part of the TLS handshake, which is likely happening over a new TCP connection that is in the early stages of its slow-start algorithm. If the certificate chain size exceeds TCP’s initial congestion window, then we will inadvertently add additional roundtrips to the TLS handshake: certificate length will overflow the congestion window and cause the server to stop and wait for a client ACK before proceeding.

In practice, the size and depth of the certificate chain was a much bigger concern and problem on older TCP stacks that initialized their initial congestion window to 4 TCP segments—see Slow-Start. For newer deployments, the initial congestion window has been raised to 10 TCP segments and should be more than sufficient for most certificate chains.

That said, verify that your servers are using the latest TCP stack and settings, and optimize and reduce the size of your certificate chain. Sending fewer bytes is always a good and worthwhile optimization.

Every new TLS connection requires that the browser must verify the signatures of the sent certificate chain. However, there is one more critical step that we can’t forget: the browser also needs to verify that the certificates have not been revoked.

To verify the status of the certificate the browser can use one of several methods: Certificate Revocation List (CRL), Online Certificate Status Protocol (OCSP), or OCSP Stapling. Each method has its own limitations, but OCSP Stapling provides, by far, the best security and performance guarantees-refer to earlier sections for details. Make sure to configure your servers to include (staple) the OCSP response from the CA to the provided certificate chain. Doing so allows the browser to perform the revocation check without any extra network roundtrips and with improved security guarantees.

  • OCSP responses can vary from 400 to 4,000 bytes in size. Stapling this response to your certificate chain will increase its size—pay close attention to the total size of the certificate chain, such that it doesn’t overflow the initial congestion window for new TCP connections.

  • Current OCSP Stapling implementations only allow a single OCSP response to be included, which means that the browser may have to fallback to another revocation mechanism if it needs to validate other certificates in the chain—reduce the length of your certificate chain. In the future, OCSP Multi-Stapling should address this particular problem.

Most popular servers support OCSP stapling. Check the relevant documentation for support and configuration instructions. Similarly, if using or deciding on a CDN, check that their TLS stack supports and is configured to use OCSP stapling.

HTTP Strict Transport Security is an important security policy mechanism that allows an origin to declare access rules to a compliant browser via a simple HTTP header—e.g., "Strict-Transport-Security: max-age=31536000". Specifically, it instructs the user-agent to enforce the following rules:

  • All requests to the origin should be sent over HTTPS. This includes both navigation and all other same-origin subresource requests—e.g. if the user types in a URL without the https prefix the user agent should automatically convert it to an https request; if a page contains a reference to a non-https resource, the user agent should automatically convert it to request the https version.

  • If a secure connection cannot be established, the user is not allowed to circumvent the warning and request the HTTP version—i.e. the origin is HTTPS-only.

  • max-age specifies the lifetime of the specified HSTS ruleset in seconds (e.g., max-age=31536000 is equal to a 365-day lifetime for the advertised policy).

  • includeSubdomains indicates that the policy should apply to all subdomains of the current origin.

HSTS converts the origin to an HTTPS-only destination and helps protect the application from a variety of passive and active network attacks. As an added bonus, it also offers a nice performance optimization by eliminating the need for HTTP-to-HTTPS redirects: the client automatically rewrites all requests to the secure origin before they are dispatched!

Make sure to thoroughly test your TLS deployment before enabling HSTS. Once the policy is cached by the client, failure to negotiate a TLS connection will result in a hard-fail—i.e. the user will see the browser error page and won’t be allowed to proceed. This behavior is an explicit and necessary design choice to prevent network attackers from tricking clients into accessing your site without HTTPS.

The HSTS mechanism leaves the very first request to an origin unprotected from active attacks—e.g. a malicious party could downgrade the client’s request and prevent it from registering the HSTS policy. To address this, most browsers provide a separate "HSTS preload list" mechanism that allows an origin to request to be included in the list of HSTS-enabled sites that ships with the browser.

Once you’re confident in your HTTPS deployment, consider submitting your site to the HSTS preload list via hstspreload.appspot.com.

One of the shortcomings of the current system—as discussed in Chain of Trust and Certificate Authorities—is our reliance on a large number of trusted Certificate Authorities (CA’s). On the one hand, this is convenient, because it means that we can obtain a valid certificate from a wide pool of entities. However, it also means that any one of these entities is also able to issue a valid certificate for our, and any other, origin without their explicit consent.

The compromise of the DigiNotar certificate authority is one of several high-profile examples where an attacker was able to issue and use fake—but valid—certificates against hundreds of high profile sites.

Public Key Pinning enables a site to send an HTTP header that instructs the browsers to remember ("pin") one or more certificates in its certificate chain. By doing so, it is able to scope which certificates, or issuers, should be accepted by the browser on subsequent visits:

  • The origin can pin it’s leaf certificate. This is the most secure strategy because you are, in effect, hard-coding a small set of specific certificate signatures that should be accepted by the browser.

  • The origin can pin one of the parent certificates in the certificate chain. For example, the origin can pin the intermediate certificate of its CA, which tells the browser that, for this particular origin, it should only trust certificates signed by that particular certificate authority.

Picking the right strategy for which certificates to pin, which and how many backups to provide, duration, and other criteria for deploying HPKP are important, nuanced, and beyond the scope of our discussion. Consult your favorite search engine, or your local security guru, for more information.

HPKP also exposes a "report only" mode that does not enforce the provided pin but is able to report detected failures. This can be a great first step towards validating your deployment, and serve as a mechanism to detect violations.

To get the best security and performance guarantees it is critical that the site actually uses HTTPS to fetch all of its resources. Otherwise, we run into a number of issues that will compromise both, or worse, break the site:

  • Mixed "active" content (e.g. scripts and stylesheets delivered over HTTP) will be blocked by the browser and may break the functionality of the site.

  • Mixed "passive" content (e.g. images, video, audio, etc., delivered over HTTP) will be fetched, but will allow the attacker to observe and infer user activity, and degrade performance by requiring additional connections and handshakes.

Audit your content and update your resources and links, including third-party content, to use HTTPS. The Content Security Policy (CSP) mechanism can be of great help here, both to identify HTTPS violations and to enforce the desired policies.

Content-Security-Policy: upgrade-insecure-requests Content-Security-Policy-Report-Only: default-src https:; report-uri https://example.com/reporting/endpoint

  1. Tells the browser to upgrade all (own and third-party) requests to HTTPS.

  2. Tells the browser to report any non-HTTPS violations to designated endpoint.

CSP provides a highly configurable mechanism to control which asset are allowed to be used, and how and from where they can be fetched. Make use of these capabilities to protect your site and your users.

As application developers we are shielded from most of the complexity of the TLS protocol—the client and server do most of the hard work on our behalf. However, as we saw in this chapter, this does not mean that we can ignore the performance aspects of delivering our applications over TLS. Tuning our servers to enable critical TLS optimizations and configuring our applications to enable the client to take advantage of such features pays high dividends: faster handshakes, reduced latency, better security guarantees, and more.

With that in mind, a short checklist to put on the agenda:

  • Get best performance from TCP; see Optimizing for TCP.

  • Upgrade TLS libraries to latest release, and (re)build servers against them.

  • Enable and configure session caching and stateless resumption.

  • Monitor your session caching hit rates and adjust configuration accordingly.

  • Configure forward secrecy ciphers to enable TLS False Start.

  • Terminate TLS sessions closer to the user to minimize roundtrip latencies.

  • Use dynamic TLS record sizing to optimize latency and throughput.

  • Audit and optimize the size of your certificate chain.

  • Configure OCSP stapling.

  • Configure HSTS and HPKP.

  • Configure CSP policies.

  • Enable HTTP/2; see HTTP/2.