In a recent discussion with fellow network engineers about encryption in a DC network, I made an observation that in some cases it might be better to simply enforce end-to-end encryption directly between applications rather than in the underlying infrastructure (MACsec, IPSEC etc.).

Looking at MACsec for example, as crypto is done by the ASIC, the general opinion was that it must be faster than doing it on a server CPU. But having no real data or comparison of that, I decided to dig a bit deeper.

I started the search with the most likely implementation of application bound encryption, that is SSL (or TLS if you want to be picky). And what better protocol than HTTP to use such encryption, with so many studies of HTTP vs HTTPS performance out there?

There are two potential encryption related bottlenecks in such a session: the handshake (aka establishing the secure communication channel) and the encryption/decryption of the application data itself.

Most discussions and comparisons I've found (SO) are centered around the handshake:

  • it adds 2 RTTs due to the extra exchanges that need to take place (only 1 RTT with False Start)
  • many short-lived sessions will make this delay overshadow any other performance metrics
  • hardware optimizations - like AVX2 instructions in Xeon Processors giving 26-255% boosts to key exchange performance
  • other elements affect perceived performance hits: static vs dynamic content, caching behaviour

Which is all nice and well, but it is not very relevant to our question - in a long-lived session, is the overhead from encryption on a generalized CPU a problem?

Let's encrypt some stuff

This AES-NI SSL Performance study shows single threaded performance for CPUs with the AES-NI instruction set - and quite a few of them can push enough data for a 10Gbps interface by pooling raw output from a bunch of cores.

I did the same test on my laptop (i7-5600U CPU @ 2.60GHz): 1 core (out of 4 with HT) could push 99MBps (Bytes!) of AES-256-CBC encrypted data to my 1Gbps (125MBps) NIC.

λ ~ openssl speed aes-256-cbc
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256 cbc      91251.57k    97223.70k    98791.17k    98241.54k    99019.43k

If I were to use the numbers above, it would take 1.0099 seconds for 100MBytes to go through openssl encryption and 0.8 seconds to be transmitted over the 1Gbps NIC (ignoring overhead, packet encapsulation etc.). So a single-threaded, single-client network application would be waiting after the encryption process.

The way I see it, unless it's a server under constant heavy load (with multiple 10Gbps interfaces and single-threaded elephant flows), the NIC should not have to wait for encrypted data and the introduced delay will be like adding another (high speed) hop to the RTT.

One last point is that while network based encryption requires crypto capacity for all the traffic passing through it (multiple servers at the same time), pushing some of it to the application level distributes the load to the edge (and server CPU performance is cheaper than specialized networking hardware when it comes to crypto).

This is probably the least scientific deduction I've made on this blog (won't become a habit I swear), so please let me know if I'm right, but especially if I'm horribly wrong, privately or in the comments below!

Other references:

And, as always, thanks for reading.

Any comments? Contact me via Mastodon or e-mail.

Share & Subscribe!