Damn it, nginx! stapling is busted.

A great deal has been written on the subject of SSL certificate revocation. For context, I'll give a short rehash of the problem and arguments. The details of revocation aren't critical here so I'll over-simplify and pre-apologize. I encourage you to research it yourself.

Suppose a CA gives you a certificate and for some reason it needs to be revoked, eg: your private key is leaked.

The original plan/mechanism was for the CA to publish a revocation list for browsers etc to compare against. This worked great when just a few certificates are in play but the scale of the Internet today makes it completely unmanageable. The original CRL lists are effectively ignored by just about everyone.

The first improvement attempt for CRL lists was OCSP - the client (your browser) could quickly ask the CA "Hey, is this cert still valid?" and the CA would give a time limited signed response. There were a number of problems - both privacy and scale. To work, browsers would effectively reveal to the CA what the user was doing and what sites they were visiting on a continuous basis. The server endpoints were prime targets for DoS attacks. It added page load latency. Not Good(TM).

Then came OCSP stapling. Instead of the browser asking the CA, the https server would do it in advance and relay it to the client. "Here's my certificate, and here's the signed status from the CA saying it is still valid as of now-ish". In theory this solves the privacy problem. In practice, the internet is made up of crappy software.

The biggest problem is OCSP stapled responses is an option and not required. Browsers with OCSP revocation checks enabled have to be prepared fall back to the privacy violating and DoS vulnerable online query to the CA. Not Good(TM), still.

There is a new variation that solves this nicely. You can have the CA set a flag in the certificate - "OCSP Must Staple". This tells the browser that the server has promised to send an OCSP status and that the browser should NOT to fall back to the CA query. This is an effective solution to revoking certificates in a timely fashion. With no current OCSP status, the browser correctly does a hard fail with no fall-backs.

Mozilla implements this on the Firefox client. Letsencrypt allows you to set this flag. ssllabs and their ssltester report it. It actually works great.........

.. except nginx is busted.

When you have "ssl_stapling on;" in your nginx.conf, guess what happens? If you guessed "nothing", you would be correct.

Nginx sends out the first reply after startup WITHOUT a stapled OCSP response included. It notices afterwards that it didn't and initiates a lazy OCSP query. At some point in the future it'll include them.

This of course gives Firefox a heart attack. It gives you an angry "REQUIRED TLS FEATURE MISSING" error page and correctly fails to load the site. When you hit "retry", it works because nginx has finally got around to actually enabling ssl stapling.

This effectively ruins the usefulness of the feature and sabotages a relatively painless improvement to the sad state of revocation handling.

Thanks, nginx!(TM).

What should nginx be doing here? When it loads a certificate, it should immediately request the corresponding OCSP status. That's what ssl_stapling on; and ssl_stapling_verify on; implies to me.

Comments? Twitter or Contact info

ps: The nghttp2 stuff works, as expected. My home sites aren't affected, but this broke FreeBSD.org sites which still use nginx.

pps: Yes, I know about CT but that's a different topic for now.