h2 proxying

With the v1.14.x versions of mod-h2, a drastic change was applied to the mod_proxy_http2 module. The trigger was PR 63170 on the Apache bugzilla. This caused me to reconsider my implementation.

The good news is that the inappropriate behaviour of mod_proxy_http2 has been fixed and the changes will become part of the next Apache httpd release! So, if you ever considered running HTTP/2 between Apache and a backend, the v1.14.1 and later is a good version to use.

For the more technical depth oriented: the former implementation was a bit ambitious and overconfident. To explain that, I need to go into more detail about mod_proxy and siblings behaviour.

mod_proxy is a handler for a request, just like file system based lookups is another one. When you define your ProxyPass and such configuration, you specify where the proxy module should take over. mod_proxy then passes the request on to the sub-module configured, usually mod_proxy_http or, as here, mod_proxy_http2.

This sub-module then makes the request to the backend server, using its protocol. It does not matter how the request was done on the incoming connection. Which is nice, because you can offer HTTP/2 to clients while processing responses on a backend that does not even talk HTTP/2.

Since all this started with the HTTP/1.x processing in mind, the proxy modules do one request at a time, return the response and are done - until called again for another request. This fits perfectly to HTTP/1 connections. However, h2 requests can work in parallel which is generally more efficient.

And this is what mod_proxy_http2 tried to enable. It was gathering incoming requests from the client connection onto a single connection to the backend. Which can make for some nice savings on your forward facing Apache server. It could make better use of its worker threads, serving more connections in parallel.

But the implementation of mod_proxy_http2 simply was not good enough. After attempts to fix concurrency problems, PR 63170 made me realize that the design inherently suffered from lack of flow control. Which caused unreasonably memory demands when clients turned up the heat.

So, in v1.14.0 I threw out all the super-parallel stuff and mod_proxy_http2 is now working similar to mod_proxy_http. It does one request at a time on a connection, serves the complete answer, frees resources and takes the next request.

Are there still advantages to be had when using it? I think so:

Header compression saves quite some bytes. Remember: the h2 connection is reused. Repeated headers become very efficient to transfer.
Aborted requests are handled more gracefully and preserve the connection. If a large request needs to be aborted (for example, as the client closed its connection or RESETs a stream), a HTTP/2 connection is cleaned up very easily and remains usable. In HTTP/1 it would need to be either closed or all data needed reading only to throw it away.

Still, the previous goal of sharing a h2 backend connection for many requests in parallel is good. I am still thinking about how a new approach to this might look - taking the lessons learned into account. For now, I hope the new mod_proxy_http2 serves you well.

Münster, 15.03.2019,

Stefan Eissing, greenbytes GmbH

Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. This file is offered as-is, without warranty of any kind. See LICENSE for details.