Nginx: try_files Is Evil Too (2024)

80 points by dextercd 5 months ago

kilburn 4 months ago

I don't think this is worth it unless you are setting up your own CDN or similar. In the article, they exchange 1 to 4 stat calls for:

- A more complicated nginx configuration. This is no light matter. You can see in the comments that even the author got bugs in their first try. For instance, introducing an HSTS header now means you have to remember to do it in all those locations.

- Running a few regexes per request. This is probably still significantly cheaper than the stat calls, but I can't tell by how much (and the author hasn't checked either).

- Returning the default 404 page instead of the CMS's for any URL in the defined "static prefixes". This is actually the biggest change, both in user-visible behavior and in performance (particularly if a crazy crawler starts checking non-existing URLs ni bulk or similar). The article doesn't even mention this.

The performance gains for regular accesses are purely speculative because the author didn't make any effort to try and quantify them. If somebody has quantified the gains I'd love to hear about it though.

technion 4 months ago

I agree. But on that final point, I have to say i hate setups where bots hitting thousands of non-existent addresses have every one of them going to a dynamic backed to produce a 404. A while back I made a rails setup that dumped routes to an nginx map of valid first level paths, but I haven't seen anyone else do that sort of thing.
- ductsurprise 4 months ago
  
  See Varnish Cache others... Or use a third party CDN that offers feature.
  Lots of ways to configure them with route based behavior-in/validation.
- ilikepi 4 months ago
  
  I've been thinking about that exact problem and solution with the map module. On the off chance you see this, do you happen to have your solution published somewhere?

KronisLV 4 months ago

Didn't Apache2 also see some performance penalty due to them also allowing you to have configuration in .htaccess files which must be read in a similar way: https://httpd.apache.org/docs/current/howto/htaccess.html#wh... (you can disable that and configure the web server similarly to how you would with Nginx, just config file(s) in a specific directory)

The likes of try_files across a bunch of web servers are pretty convenient though, as long as the performance penalty doesn't become a big deal.

Plus, I've found that it's nice to have api.myapp.com and myapp.com as separate bits of config, so that the ambiguity doesn't exist for anything that's reverse proxied and having as much of the static assets (for example, for a SPA) separate from all of that. Ofc it becomes a bit more tricky for server side rendering or the likes of Ruby on Rails, Laravel, Django etc. that try to have everything in a single deployment.

kilburn 4 months ago

Apache's .htaccess is much worse performance-wise because it checked (and processed if it existed) all .htaccess files in all folders in the path. That is, you opened example.com/some/thing/interesting and apache would check (and possibly process) /docroot/.htaccess, /docroot/some/.htaccess, /docroot/some/thing/.htaccess and /docroot/some/thing/interesting/.htaccess.
Separating api and "front" in different domains does run into CORS issues though. I find it much nicer to reserve myapp.com/api for the API and route that accordingly. Also, you avoid having to juggle an "API_URL" env definiton in your different envs (you can just call /api/whatever, no matter which env you are in).
- merpkz 4 months ago
  
  Was that really so bad in terms of performance? Surely .htaccess didn't exist there most of the time and even if it did, that would have been cached by kernel so each lookup by apache process wouldn't be hitting disk directly to check for file existance for each HTTP request it processes. Or maybe I am mistaken about that.
  - kilburn 4 months ago
    
    The recommendation was to disable it because:
    a) If you didn't use it (the less bad case you are considering) then why pay for the stat syscalls at every request?
    b) If you did use it, apache was reparsing/reprocessing the (at least one) .htaccess file on every request. You can see how the real impact here was significantly worse than a cached stat syscall.
    Most people were using it, hence the bad rep. Also, this was at a time where it was more comon to have webservers reading from NFS or other networked filesystem. Stat calls then involve the network and you can see how even the "mild" case could wreak havoc in some setups

wildpeaks 4 months ago

Sticking to this rule has served me well over the years:

- resources that are dynamically-generated are served by API endpoints, therefore known locations with predictable parameters

- everything else must be static files

And definitely no dynamic script as the fallback rule, it's too wasteful in an era of crawlers that ignore robots.txt and automated vulnerability scanners.

A backend must be resilient.

EGreg 4 months ago

Speaking of NGINX directives that can make a big difference serving files, here is how you we use it to enforce access control:

https://community.qbix.com/t/restricting-access-to-resources...

mickeyp 4 months ago

Why not just use the `secure_link' feature? It's designed for this.
- EGreg 4 months ago
  
  X-Accel-Redirect is better, it allows us to use any custom logic in our php script to calculate access to resources
- rudasn 4 months ago
  
  https://www.f5.com/company/blog/nginx/securing-urls-secure-l...
  For anyone just learning about this.

nwmcsween 4 months ago

Just use `open_file_cache`?

huang_chung 4 months ago

Fortunately Caddy is not evil: https://caddyserver.com/

mholt 4 months ago

Caddy also has try_files. But it's implemented differently.
DoctorOW 4 months ago

Caddy is great but it doesn't solve the problems of this post. Namely, it is slightly less performant in exchange for ease of use, a tradeoff the author of the original article doesn't seem interested in.
- miladyincontrol 4 months ago
  
  Thats what people love to say but I'm not so confident of that.
  Yeah theres the slight Go tax in latency, but almost every comparison online is benchmarking a fairly optimized and often cache configured nginx or apache config versus the most basic caddy config possible. Even worse, most are just testing http1 speeds using near zero-size files, who cares about how many theoretical connections it supports, lets talk how many users it supports on real world content without grinding to a halt. A few more lines of config and a more production intended caddy config is drawing like punches.
  Least in my real world testing I found little meaningful improvement using nginx, worse, it would grind to a halt a halt under loads that caddy at least while bogged down, would still be responsive during.
  - DoctorOW 4 months ago
    
    Maybe I'm too small in terms of traffic but I've never found Nginx/Caddy to be the bottleneck in scaling, so I have to rely on these benchmarks.
    In my personal opinion, even if there is runtime overhead with Caddy it's worth it. Hardware is cheap relative to what my time is worth to me.