HTTP/2 for Apache httpd
Copyright (C) 2015 greenbytes GmbH
Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. This file is offered as-is, without warranty of any kind. See LICENSE for details.
This is a look at the internals of the mod_h2
implementation and its interfaces
with Apache httpd
. I try to give experiences and observations made during implementation
of mod_h2
without any guarantee of completeness or particular order. All mistakes made
are my own.
The nature of HTTP/2 places new demands on a server. In HTTP/1, the client only expectations after sending a request is to get an answer to that request as soon as possible. Even if it pipelines a second request on the same connection, it will expect the answer to the first one to arrive before the second. The server can process a HTTP/1 connection by a single thread, since it has only one thing to perform at at time (I exclude sub-requests in this discussion).
And
this was the model for the early httpd. It later got refined by different multi-processing modules, the current star
being mpm_event
that can reuse threads during times when a request is waiting. But while the thread may
change during the lifetime of a connection, there is only ever one at a time. And there is only ever one requests worked
on per connection at a time. In gaming once would say this is a "1-1-1"
built.
HTTP/2 is built for handling multiple requests at once, expecting high bandwidth utilization, interleaving of responses and even on-the-fly prioritization. But not only that, both endpoints of a HTTP/2 connection are frequently exchanging meta information, adjusting window sizes, update settings or even simply answering a ping request.
A HTTP/2 server that only serves static files may handle all of this in a single thread, using some sort of async
io or event handling. A server like httpd
that allows configurable filters/handlers and foreign
request processing modules, cannot do that. Instead request processing must be shifted into separate threads, while the
"main" thread serves the connection and collects and combines results from request processing. This would then be called
a "1-n-n"
processing model.
But that still is too simple, since threads are a valuable and limited server resource. To guarantee a thread to each
client request is not possible unless the number of parallel requests is kept small. But that would then limit the client
and, especially on high latency connections, potentially lower performance. A better model is to allow queuing of requests
up to a certain amount if not enough processing threads are available. Then the server has a "1-n-m"
processing model.
mod_h2
implements that model with one thread serving the connection, allowing up to a configurable number
of parallel requests which are then served by a number of workers. The worker output, the response data, is collected again
in the connection thread for sending out to the client. Worker threads are blocked when buffered output reaches a
certain memory size (This is the model adopted from mod_spdy
).
In its internal architecture, mod_h2
started with the basic mod_spdy
architecture
and then started tweaking some more. For those not familiar with mod_spdy
I try to give a short
summary of what google did there:
mod_spdy
architectureThe architecture is most easily understood by following what happens on a new connection and the first request:
pre_connection
hook, running after mod_ssl
, the module registers NPN callbacks for new TLS connections.connection
hook, also after mod_ssl
, the whole connection processing
is taken over if a spdy
protocol was negotiated via NPN. This disables any further
processing of the connection by httpd core. However the mod_ssl
in-/output filters are in place.spdy
now talks the selected spdy dialect with the client, negotiates some parameter and
reads the first request.ap_process_connection()
,
which runs all the usual connection hooks. The pre-connection hooks of the module detect the nature of
the connection and disable mod_ssl
for it, among other things.mod_spdy
has a filter pair to read/write data to a connection that
runs the httpd HTTP/1
processing engine in another thread.HTTP/1.1
format onto that pseudo connection
where httpd core
will parse it, create a request_rec
and process that. The
response eventually arrives in HTTP/1.1
at spdy's output filter on that connection, is
converted to internal format and passed in spdy protocol packages out on the main connection.
This is how mod_spdy
works, in a nutshell. There are many details to get things right and
everyone convinced that there is a HTTP/1.1 request on a connection, just business as usual, move on
please and disregard the man behind the curtain.
And it is a very good approach to bringing the spdy/h2 processing model into httpd
as it allows
most of httpd's infrastructure as hooks, filters and other modules to keep on processing requests, even
though the originally arrived via a totally different network protocol.
But disadvantages are also there:
ap_run_create_connection
at that time (mod_spdy was created on 2.2 in 2012?). And even if it could have used it, it still would
not have worked with mpm_event
. There is simply something missing in the core API that allows
for a spdy/h2 like processing model. Hacks can be made, but may be soon outdated by httpd development.mod_spdy
implementation was held a bit generic and does not use the APR to its
fullest extend. Probably because the spdy engine is being used also inside other google code. Data passing
involves more copying than all the carefully crafted bucket code in Apache deserves.mod_h2
architecture
mod_h2
was written from scratch, but took the ideas from spdy. The main structures/names connected
to the concepts introduced by spdy are:
h2_session
: the instance handling the main connection, keep the ngttp2
instance and all other state information.h2_stream
: a HTTP/2 stream, the equivalent of a request/response pair.h2_task
: the processor for a h2_stream that gets executed in another thread.h2_worker
: a specialized thread for executing h2_tasks.h2_mplx
: a multiplexing instance, one per main connection, that does the talking/synchronization between h2_session and h2_tasks.h2_conn
: sets up pseudo connections.h2_request+h2_to_h1
: request headers and conversion into httpd HTTP/1 format.h2_response+h2_from_h1
: response headers and conversion from httpd HTTP/1 format.h2_h2
: hooks and filters for handling "h2" protocol in TLS connections.h2_h2c
: hooks and filters for handling "h2c" protocol upgrades in clear text requests.
So, connection setup runs by h2_h2
, upgrades in clear by h2_h2c
. On success, a h2_session
is created. Any newly opened HTTP/2 stream results in a h2_stream. When all headers have been received in a h2_request
,
a h2_task
is created and added to the task queue.
A h2_worker
eventually takes the h2_task
, sets up the environment pools, bucket allocators, filters
and converts the h2_request
into a httpd request_rec
. This processed by httpd core.
The output filters then transform the output into a h2_response
which is passed to the h2_mplx
.
h2_session
regularly polls h2_mplx
for new responses and submits those to the client.
Request and response bodies are passed via h2_mplx
from h2_session
to h2_task
and vice versa. When the response body is passed, h2_task
ends, its resources are reclaimed and
the h2_worker
will start on other tasks.
Once all data for a stream has been processed (or when the stream has been aborted), the h2_stream
is
destroyed and all its resources (memory pools, buckets) are reclaimed. This can happen before the h2_task
for the stream is done. The cleanup needs to be delayed then and such streams are kept in a special zombie set.
The closing of connection (graceful or not) triggers the destruction of h2_session
which again frees
resources. There, it needs to remove all pending h2_task
from the queue and join all pending zombie
streams before shutting down itself.
mod_h2
hacks
The creation of pseudo connections is using ap_run_create_connection
instead of doing it manually. This
works for mpm_worker
, but mpm_event
is not happy with it and a special hack needed to be
added, mainly for setting up a satisfactory connection state structure.
Other mpm modules have not been tested yet. It would be preferable the extend ap_run_create_connection
to setup a connection regardless of what mpm
is configured.
mod_h2
has code to serialize requests in HTTP/1.1 format so that httpd core may ap_read_request()
if from the pseudo connection and then ap_process_request()
it. This is basically how
ap_process_connection()
works, plus some mpm state handling and connection close/keepalive handling.
In v0.6.0
, an alternate implementation was added that directly creates a request_rec
and
invokes ap_process_request()
directly. This saves the serialization and parsing exercise and gives
better performance at the cost of compatibility.
Setting up the request_rec
currently copies two pages of code from the httpd core, something which could
be easily mitigated by enhancing the core API.
The HTTP/1.1 serialization can be configured on/off. It is disabled by default.
Similar to the request handling, the response was initially parsed from the pseudo connection. And that code
is still there when serialization is configured on. By default however, an optimization is in place that replaces
the core HTTP_HEADER
filter with its own variation.
HTTP_HEADER
is a quite large filter that does the following tasks:
request_rec
and to and remove from the response headers. Initialize potentially missing fields such as status_line.mod_h2
is a copy of that filter stripped from all the unwanted parts.
HTTP_HEADER
should be split into two filters: HTTP_HEADER
and HTTP_SERIALIZATION
. Then
the first one could be kept and code duplication avoided.
Processing a HTTP/2 connection means that data from the client may arrive any time and that stream data
from responses need to be sent as soon as it becomes available. In the current mod_h2
this
is done in a polling loop that works like this:
h2_mplx
has new responses to be submitted and sent what is there.nghttp2
write, if it wants. This may pull stream response data and suspend streams
if no data is available at h2_mplx
.nghttp2
instance for this connection.h2_mplx
.
In common web page processing however, there will be bursts of streams interleaved with long periods of
nothingness and in such periods mod_h2
can do a blocking read on the main connection.
Due to the processing model request and response data needs to traverse threads. In the
httpd
infrastructure, this means data in apr_bucket
s handed out/placed into
apr_bucket_brigade
s.
apr_bucket
has no link to a apr_bucket_brigade
. It can move freely from one brigade
to the next. However, it cannot move freely from one thread to another. This is due to the fact that almost all
interesting operations on a bucket will involve the apr_bucket_alloc_t
it is created with. The job
of the apr_bucket_alloc_t
is to manage a free list of suitable memory chunks for fast bucket
creation/split/transform operations. And it is not thread-safe.
This requires all apr_bucket
s managed by the same apr_bucket_alloc_t
to stay in the same
thread. (There are even more turtles down there, as the apr_bucket_alloc_t
uses a apr_allocator_t
itself, but that can be configured thread safe, if needed).
So, while a worker thread is writing to its output apr_bucket_brigade
, buckets from this brigade
cannot be transferred and manipulated in another thread. Which means mod_h2
cannot simply transfer
the data from the workers to the main thread and out on the connection output apr_bucket_brigade
.
A closer look at the apr_bucket
reveals that bucket data is not supposed to leave its
apr_bucket_alloc_t
instance. Which is no surprise as that was never necessary in the 1-1-1
processing model.
That means mod_h2
needs to read from one brigade and write to another brigade when it wants data
to cross thread boundaries. Which means basically memcpy
the request and response data. Which is not smart.
For resources in static files, httpd
has a very efficient implementation. A file descriptor
is placed into a bucket and that bucket is passed on to the part handling the connection output streaming. Only
there it will be read and written to the connection, using preferably the most efficient way that the host operating
system offers.
Any improvement to mod_h2
's output handling would transfer the file handle exactly that way. But that
poses another challenge, as described in "resource allocation".
If mod_h2
handled output of static file resources similar to httpd
, it could easily
run out of open file descriptors under load. Especially when processing many parallel requests with the same
priority.
10 requests for files could then have 10 file descriptors open, since output of streams with same priority is interleaved. In HTTP/1 processing, the 1-1-1 model, 10 file requests on the same connection would only allocate 1 file descriptor at a time.
And, the HTTP/2 spec says " It is recommended that this value be no smaller than 100, so as to not
unnecessarily limit parallelism."
So, potentially worst case, a straightforward implementation would open
100 file descriptors at the same time for each connection. This certainly worsens the problem, especially since
HTTP/2 connections are expected to stay open for a much longer duration.
The current implementation in mod_h2
never passes file buckets from h2_task
to
h2_mplx
, it just passes data. That means that file handles are only open while a h2_task
is being processed. And a h2_task
being processed means that it is allocated to a h2_worker
thread.
As a result of all this, the number of simultaneous open file descriptors is in the order of
h2_worker
numbers. And those are limited (and configurable). This gives a behaviour that
is stable under load.
But not as efficient as it could be. Maybe there is a good way to introduce resource managements that allows passing of file handles as long as configurable limit has not been exceeded.
Since httpd is such a rich environment, deployed on many platforms and host of many, many modules, there
certainly will be more incompatibilities discovered in connection with mod_h2
. Some few issues
have been reported on github, but I am certain many more await.
The main cause, I suspect, will be the pseudo connection handling, especially the setup. Modules that add their data to a connection and then later expect to find it there again during request processing, are the most vulnerable (One of the issues was a report that SSL variables no longer worked).
It would be nice to get rid of these pseudo connections (or replace them with some other concept).