HTTP/2 for Apache httpd
Copyright (C) 2015 greenbytes GmbH
Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. This file is offered as-is, without warranty of any kind. See LICENSE for details.
This is a look at the internals of the mod_h2 implementation and its interfaces
with Apache httpd. I try to give experiences and observations made during implementation
of mod_h2 without any guarantee of completeness or particular order. All mistakes made
are my own.
The nature of HTTP/2 places new demands on a server. In HTTP/1, the client only expectations after sending a request is to get an answer to that request as soon as possible. Even if it pipelines a second request on the same connection, it will expect the answer to the first one to arrive before the second. The server can process a HTTP/1 connection by a single thread, since it has only one thing to perform at at time (I exclude sub-requests in this discussion).
And
this was the model for the early httpd. It later got refined by different multi-processing modules, the current star
being mpm_event that can reuse threads during times when a request is waiting. But while the thread may
change during the lifetime of a connection, there is only ever one at a time. And there is only ever one requests worked
on per connection at a time. In gaming once would say this is a "1-1-1" built.
HTTP/2 is built for handling multiple requests at once, expecting high bandwidth utilization, interleaving of responses and even on-the-fly prioritization. But not only that, both endpoints of a HTTP/2 connection are frequently exchanging meta information, adjusting window sizes, update settings or even simply answering a ping request.
A HTTP/2 server that only serves static files may handle all of this in a single thread, using some sort of async
io or event handling. A server like httpd that allows configurable filters/handlers and foreign
request processing modules, cannot do that. Instead request processing must be shifted into separate threads, while the
"main" thread serves the connection and collects and combines results from request processing. This would then be called
a "1-n-n" processing model.
But that still is too simple, since threads are a valuable and limited server resource. To guarantee a thread to each
client request is not possible unless the number of parallel requests is kept small. But that would then limit the client
and, especially on high latency connections, potentially lower performance. A better model is to allow queuing of requests
up to a certain amount if not enough processing threads are available. Then the server has a "1-n-m" processing model.
mod_h2 implements that model with one thread serving the connection, allowing up to a configurable number
of parallel requests which are then served by a number of workers. The worker output, the response data, is collected again
in the connection thread for sending out to the client. Worker threads are blocked when buffered output reaches a
certain memory size (This is the model adopted from mod_spdy).
In its internal architecture, mod_h2 started with the basic mod_spdy architecture
and then started tweaking some more. For those not familiar with mod_spdy I try to give a short
summary of what google did there:
mod_spdy architectureThe architecture is most easily understood by following what happens on a new connection and the first request:
pre_connection hook, running after mod_ssl, the module registers NPN callbacks for new TLS connections.connection hook, also after mod_ssl, the whole connection processing
is taken over if a spdy protocol was negotiated via NPN. This disables any further
processing of the connection by httpd core. However the mod_ssl in-/output filters are in place.spdy now talks the selected spdy dialect with the client, negotiates some parameter and
reads the first request.ap_process_connection(),
which runs all the usual connection hooks. The pre-connection hooks of the module detect the nature of
the connection and disable mod_ssl for it, among other things.mod_spdy has a filter pair to read/write data to a connection that
runs the httpd HTTP/1 processing engine in another thread.HTTP/1.1 format onto that pseudo connection
where httpd core will parse it, create a request_rec and process that. The
response eventually arrives in HTTP/1.1 at spdy's output filter on that connection, is
converted to internal format and passed in spdy protocol packages out on the main connection.
This is how mod_spdy works, in a nutshell. There are many details to get things right and
everyone convinced that there is a HTTP/1.1 request on a connection, just business as usual, move on
please and disregard the man behind the curtain.
And it is a very good approach to bringing the spdy/h2 processing model into httpd as it allows
most of httpd's infrastructure as hooks, filters and other modules to keep on processing requests, even
though the originally arrived via a totally different network protocol.
But disadvantages are also there:
ap_run_create_connection
at that time (mod_spdy was created on 2.2 in 2012?). And even if it could have used it, it still would
not have worked with mpm_event. There is simply something missing in the core API that allows
for a spdy/h2 like processing model. Hacks can be made, but may be soon outdated by httpd development.mod_spdy implementation was held a bit generic and does not use the APR to its
fullest extend. Probably because the spdy engine is being used also inside other google code. Data passing
involves more copying than all the carefully crafted bucket code in Apache deserves.mod_h2 architecture
mod_h2 was written from scratch, but took the ideas from spdy. The main structures/names connected
to the concepts introduced by spdy are:
h2_session: the instance handling the main connection, keep the ngttp2 instance and all other state information.h2_stream: a HTTP/2 stream, the equivalent of a request/response pair.h2_task: the processor for a h2_stream that gets executed in another thread.h2_worker: a specialized thread for executing h2_tasks.h2_mplx: a multiplexing instance, one per main connection, that does the talking/synchronization between h2_session and h2_tasks.h2_conn: sets up pseudo connections.h2_request+h2_to_h1: request headers and conversion into httpd HTTP/1 format.h2_response+h2_from_h1: response headers and conversion from httpd HTTP/1 format.h2_h2: hooks and filters for handling "h2" protocol in TLS connections.h2_h2c: hooks and filters for handling "h2c" protocol upgrades in clear text requests.
So, connection setup runs by h2_h2, upgrades in clear by h2_h2c. On success, a h2_session
is created. Any newly opened HTTP/2 stream results in a h2_stream. When all headers have been received in a h2_request,
a h2_task is created and added to the task queue.
A h2_worker eventually takes the h2_task, sets up the environment pools, bucket allocators, filters
and converts the h2_request into a httpd request_rec. This processed by httpd core.
The output filters then transform the output into a h2_response which is passed to the h2_mplx.
h2_session regularly polls h2_mplx for new responses and submits those to the client.
Request and response bodies are passed via h2_mplx from h2_session to h2_task
and vice versa. When the response body is passed, h2_task ends, its resources are reclaimed and
the h2_worker will start on other tasks.
Once all data for a stream has been processed (or when the stream has been aborted), the h2_stream is
destroyed and all its resources (memory pools, buckets) are reclaimed. This can happen before the h2_task
for the stream is done. The cleanup needs to be delayed then and such streams are kept in a special zombie set.
The closing of connection (graceful or not) triggers the destruction of h2_session which again frees
resources. There, it needs to remove all pending h2_task from the queue and join all pending zombie
streams before shutting down itself.
mod_h2 hacks
The creation of pseudo connections is using ap_run_create_connection instead of doing it manually. This
works for mpm_worker, but mpm_event is not happy with it and a special hack needed to be
added, mainly for setting up a satisfactory connection state structure.
Other mpm modules have not been tested yet. It would be preferable the extend ap_run_create_connection
to setup a connection regardless of what mpm is configured.
mod_h2 has code to serialize requests in HTTP/1.1 format so that httpd core may ap_read_request()
if from the pseudo connection and then ap_process_request() it. This is basically how
ap_process_connection() works, plus some mpm state handling and connection close/keepalive handling.
In v0.6.0, an alternate implementation was added that directly creates a request_rec and
invokes ap_process_request() directly. This saves the serialization and parsing exercise and gives
better performance at the cost of compatibility.
Setting up the request_rec currently copies two pages of code from the httpd core, something which could
be easily mitigated by enhancing the core API.
The HTTP/1.1 serialization can be configured on/off. It is disabled by default.
Similar to the request handling, the response was initially parsed from the pseudo connection. And that code
is still there when serialization is configured on. By default however, an optimization is in place that replaces
the core HTTP_HEADER filter with its own variation.
HTTP_HEADER is a quite large filter that does the following tasks:
request_rec
and to and remove from the response headers. Initialize potentially missing fields such as status_line.mod_h2 is a copy of that filter stripped from all the unwanted parts.
HTTP_HEADER should be split into two filters: HTTP_HEADER and HTTP_SERIALIZATION. Then
the first one could be kept and code duplication avoided.
Processing a HTTP/2 connection means that data from the client may arrive any time and that stream data
from responses need to be sent as soon as it becomes available. In the current mod_h2 this
is done in a polling loop that works like this:
h2_mplx has new responses to be submitted and sent what is there.nghttp2 write, if it wants. This may pull stream response data and suspend streams
if no data is available at h2_mplx.nghttp2 instance for this connection.h2_mplx.
In common web page processing however, there will be bursts of streams interleaved with long periods of
nothingness and in such periods mod_h2 can do a blocking read on the main connection.
Due to the processing model request and response data needs to traverse threads. In the
httpd infrastructure, this means data in apr_buckets handed out/placed into
apr_bucket_brigades.
apr_bucket has no link to a apr_bucket_brigade. It can move freely from one brigade
to the next. However, it cannot move freely from one thread to another. This is due to the fact that almost all
interesting operations on a bucket will involve the apr_bucket_alloc_t it is created with. The job
of the apr_bucket_alloc_t is to manage a free list of suitable memory chunks for fast bucket
creation/split/transform operations. And it is not thread-safe.
This requires all apr_buckets managed by the same apr_bucket_alloc_t to stay in the same
thread. (There are even more turtles down there, as the apr_bucket_alloc_t uses a apr_allocator_t
itself, but that can be configured thread safe, if needed).
So, while a worker thread is writing to its output apr_bucket_brigade, buckets from this brigade
cannot be transferred and manipulated in another thread. Which means mod_h2 cannot simply transfer
the data from the workers to the main thread and out on the connection output apr_bucket_brigade.
A closer look at the apr_bucket reveals that bucket data is not supposed to leave its
apr_bucket_alloc_t instance. Which is no surprise as that was never necessary in the 1-1-1
processing model.
That means mod_h2 needs to read from one brigade and write to another brigade when it wants data
to cross thread boundaries. Which means basically memcpy the request and response data. Which is not smart.
For resources in static files, httpd has a very efficient implementation. A file descriptor
is placed into a bucket and that bucket is passed on to the part handling the connection output streaming. Only
there it will be read and written to the connection, using preferably the most efficient way that the host operating
system offers.
Any improvement to mod_h2's output handling would transfer the file handle exactly that way. But that
poses another challenge, as described in "resource allocation".
If mod_h2 handled output of static file resources similar to httpd, it could easily
run out of open file descriptors under load. Especially when processing many parallel requests with the same
priority.
10 requests for files could then have 10 file descriptors open, since output of streams with same priority is interleaved. In HTTP/1 processing, the 1-1-1 model, 10 file requests on the same connection would only allocate 1 file descriptor at a time.
And, the HTTP/2 spec says " It is recommended that this value be no smaller than 100, so as to not
unnecessarily limit parallelism." So, potentially worst case, a straightforward implementation would open
100 file descriptors at the same time for each connection. This certainly worsens the problem, especially since
HTTP/2 connections are expected to stay open for a much longer duration.
The current implementation in mod_h2 never passes file buckets from h2_task to
h2_mplx, it just passes data. That means that file handles are only open while a h2_task
is being processed. And a h2_task being processed means that it is allocated to a h2_worker
thread.
As a result of all this, the number of simultaneous open file descriptors is in the order of
h2_worker numbers. And those are limited (and configurable). This gives a behaviour that
is stable under load.
But not as efficient as it could be. Maybe there is a good way to introduce resource managements that allows passing of file handles as long as configurable limit has not been exceeded.
Since httpd is such a rich environment, deployed on many platforms and host of many, many modules, there
certainly will be more incompatibilities discovered in connection with mod_h2. Some few issues
have been reported on github, but I am certain many more await.
The main cause, I suspect, will be the pseudo connection handling, especially the setup. Modules that add their data to a connection and then later expect to find it there again during request processing, are the most vulnerable (One of the issues was a report that SSL variables no longer worked).
It would be nice to get rid of these pseudo connections (or replace them with some other concept).