mod_h[ttp]2

HTTP/2 for Apache httpd

h2/h2c throughput in apache

Copyright (C) 2015 greenbytes GmbH

Support for HTTP/2 has been released in Apache httpd 2.4.17 in the experimental mod_h[ttp]2 module. See my how to for instructions.

Today I want to shine some light on one topic that we are going to improve in the upcoming releases of httpd, and that is: throughput. I did improvements of the mod_h[ttp]2 streaming of static files that can be seen below. With a little bit of luck, we can get those into the next Apache release.

Local Perform

There are two interesting scenarios for throughput testing: cleartext and encrypted, or http: and https:. The latter one because browsers only talk HTTP/2 for https: urls. The cleartext one because Apache httpd is also used behind load balancers and dedicated TLS hardware, and in those cases talking HTTP/2 over unencrypted (data center) connections is very interesting.

encrypted

The https tests show that ApacheBench is not well suited for TLS performance tests. Luckily, h2load can be told to use HTTP/1.1 as well.

https: throughput
trunk, HTTP/1.1, ab, 800 MB/s
trunk, HTTP/1.1, h2load, 1900 MB/s
2.4.17, HTTP/2, h2load, 1250 MB/s
trunk, HTTP/2, h2load, 1900 MB/s
While HTTP/2 only gives 66% of the HTTP/1.1 throughput in 2.4.17 (in this test scenario!), trunk eliminates that difference. Apache can now serve static files via https: with the same performance via both HTTP versions.

cleartext

The cleartext (h2c) mode of mod_h[ttp]2 did not get the love it deserves until now. Write strategies, necessary for TLS, were also used in the cleartext case - with suboptimal results:

http: throughput
trunk, HTTP/1.1, ab, 3800 MB/s
trunk, HTTP/1.1, h2load, 4900 MB/s
2.4.17, HTTP/2, h2load, 1700 MB/s
trunk, HTTP/2, h2load, 4700 MB/s

The trunk HTTP/2 implementation is now almost as fast as HTTP/1.1 on unencrypted connections. And compared to 2.4.17, it is now 2.75 times as fast! Sweet!

setup and tests

I used the following setup:

For HTTP/1.1 I used ApacheBench 2.3 and h2load. For HTTP/2 h2load 1.4.0, provided the numbers. The exact commands were:
ab -c8 -n 1000 -k http://test.example.org:12345/005.txt
h2load -c 8 -t 8 -m m -n 1000 -p http/1.1 http://test.example.org:12345/005.txt
h2load -c 8 -t 8 -m 1 -n 1000 --npn-list=http/1.1 https://test2.example.org:12346/005.txt
h2load -c 8 -t 8 -m 1 -n 1000 https://test.example.org:12346/005.txt
with 005.txt being a 10 MB text file. Notice that h2load is doing only 1 stream/connection at a time.

what changed?

The short answer: buffer copies have been eliminated (mostly).

HTTP/1.1 servers, when sending a document file, open the file and bring the handle to the part of the server that directly writes the connection. That code then uses sendfile() to let the operating system transfer the file data to the connection in the most efficient way possible.

And this is what mod_h[ttp]2 now has learned as well. Performance numbers have almost reached the HTTP/1.1 case, where they will most likely stay. HTTP/1.1 has a single request per connection, so sendfile() can be called once for the complete file. HTTP/2 sends data in frames which consist of some 9-10 bytes of header data, followed by a file data chunk and 0-255 padding bytes. The file chunk has a maximum of 16 KB right now.

This interleaving of data makes transfer of files less efficient than the HTTP/1.1 case. But only a little bit less efficient. (I did not experiment with larger frame sizes, as that defeats the purpose of a HTTP/2 shared connection.)

will https become faster?

It could - but it very much depends on the scenario you use it in. And what you overall goals are.

The tricky thing about TLS is that it writes the connection data in 0-16 KB chunks. Each chunk has a certain overhead, so you'd might think that only writing 16 KB is best. Not always so.

The receiver of TLS data needs to have the complete chunk in order to decrypt it. Having on 99% of such a chunk is useless! So, when a browser gets the first kilobyte, it can do nothing with it. Same for the next kilobytes, until the chunk is complete. Browser page load times will grow.

For better page load times, a server needs to write small chunks in the beginning, ideally to have it all in one TCP/IP packet and that packet to fit into a single MTU chunk. Then a browser will be able to decrypt all packets it receives immediately, detect new resources to load and send off the requests for those.

But the raw transfer performance will suffer. Faster pages by slower transfers! How much will it suffer? The implementation in trunk uses such a "slow start" on TLS by default and if you run the tests against a site which always tries to write 16 KB, you get:

TLS, slow start vs. max writes
slow start, 1640 MB/s
max write, 1900 MB/s
Which configuration makes most sense to you depends on your use case. If your Apache instance serves browsers directly, the slow start will most likely give you best results. If your server talks to other (front-end) servers in a data center, max writes will give best results.

sources

You can get the Apache release from here. HTTP/2 support is included in Apache 2.4.17 and upwards. I will not repeat instructions on how to build the server in general. There is excellent material available in several places, for example here.

Update, Nov 4th 2015

Lucas Pardue (@SimmerVigor) gave me the hint that h2load works well with http/1.1 cleartext when giving the -p http/1.1 option. Thanks!

Münster, 03.11.2015,

Stefan Eissing, greenbytes GmbH

Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. This file is offered as-is, without warranty of any kind. See LICENSE for details.