Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How HN crushed David Walsh's blog (davidwalsh.name)
91 points by headalgorithm on Oct 28, 2020 | hide | past | favorite | 99 comments


I have a bog-standard WordPress, on a shared host, and it doesn't suffer from the HN "hug of death". I don't use CloudFlare or any other CDN.

The secret? I have a lightweight theme with minimal dynamic content, and I use LiteSpeed cache on the server. That's it. Easily handled 20-40 thousand pageviews over a couple of hours.


I also have happened upon an innovative solution to avoid the HN hug of death: Nobody reads my blog.


I endorse this solution as well and proactively work towards it by only publishing boring and obscure content. It has many benefits:

* blazingly fast for readers with no chance of slowdown

* very low hosting costs

* linkfarmers never steal my content

* I don't have to moderate comments

I recommend this approach to everyone.



I got a perfect score. If we're being honest, it's more because I know my hair care products than my Dr. Who planets.


You really should use an anonymous sockpuppet account if you are going to admit embarrassing things like that in a public forum.


You've failed, unfortunately. Your "boring" content is actually quite interesting.

Oh well, everything else is true: Blazingly fast, low hosting costs, no comment moderation, etc.

>_<


Hey, you take that back.


Personally, I plan to try making the apple dumplings.


Hey, I am about to be a Boston programmer living in New Zealand! Any tips?


Office culture in NZ is typically less formal and less top-down than in the US, but that varies widely.

Don't feel bad about taking your allotted paid time off days and sick days (if you need to) - it is expected that you will disappear for a week or two periodically.

Marmite is supposed to be spread very thinly on buttered toast. Vogels is a good brand of bread for this.

Despite what you might hear on the street, Pavlova is not good.

Want to know the rest? Hey, buy the rights.


This is the correct solution!

And yet that means so many fascinating blogs go unread.

A while back, I scraped the top-level comments from a "dear HN, what's your personal blog" post, and found SO MANY AMAZING BLOGS!

The tool's here: https://random-hn-blog.herokuapp.com

I'd much rather read a small obscure blog with very little traffic (well, any small obscure blog but mine) than something that regularly gets deluges of traffic.

Something about knowing they're writing to a potentially huge audience changes the writing, I think...


I wish to subscribe to your newsletter.


Send your email to /dev/null


oops... accidentally sent to /nev/dull!


My method of not having a blog is even better.


Totally blows my mind as well.

This post[0] was #1 on the front page for a day and I had 0 issues serving requests running on the cheapest shared Wordpress hosting with the LiteSpeed Cache plugin enabled.

[0] https://andreschweighofer.com/agile/anxiety-in-product-devel...


I've been on the front page of HN a few times, and my site has never gone down due to the hug of death. My setup is a simple low spec VM running under Microsoft Hyper-V on an aging twin Xeon tower server. My site is Apache+PHP on Debian with no SQL backend; the PHP uses templates and flat files. Rock solid, and can really take a hammering.

I'm starting to think that it's time for people to stop using WordPress+MySQL and move over to something more performant.


I also run a project with Apache+PHP on Debian, with SQLite and a standalone C program that it interacts with running on the same machine. The whole setup runs on a single, low-spec VPS, and it too easily withstood the HN front page a few months ago [0], with plenty of capacity to spare.

I've found it amazing how much you can do these days with even a cheap VPS, and how many requests you can serve as long as you're disciplined about not going overboard with excessive dynamic content and knowing where your scalability pain points are (sometimes this isn't obvious until you really get a lot of traffic!).

[0] https://news.ycombinator.com/item?id=23661326


I was thinking that 100 requests per minute doesnt sound like a hug of death. The article does touch on reducing the amount of dynamic content, which reduces the number of data

What makes MySQL so bad at handling queries? I have never worked with it personally, but it seems like a core feature of a database should be handling many concurrent requests


MySQL (and PostgreSQL, for that matter) isn't bad at handling queries, when the tables are tuned for those queries. People don't do this well (if at all), and few databases are capable of automatically tuning tables (creating indices), since they require resources, and can have tradeoffs between read and write performance.

Properly tuned, a database is able to handle millions of requests per second.


> What makes MySQL so bad at handling queries? I have never worked with it personally, but it seems like a core feature of a database should be handling many concurrent requests

The software that is used to generate the queries. Inadequate indices, writes on each page load that lock shared tables...


As someone who has worked on WordPress hosting companies for over 10 years now, I feel obligated to add that many hosting companies do not optimize MySQL settings either, despite their marketing saying otherwise.

In fact, the database layer is often forgotten when you talk optimization in the WordPress world — page cache is usually seemed as the holy grail, with MySQL left to fend for itself.


In defense of anyone managing WordPress, its database schema is not well-normalized or organized (last I checked, it still had something like 6 tables).

That scenario (in general, not referring specifically to WP anymore) makes database indexing/optimization much more expensive at webhost scale, because you potentially have columns that range in size from empty to 1MB.

That's why most WP users have gone deep down the rabbithole of view-level caching, because optimizing an uncached result is so much harder in that environment.


Oh, I agree, optimizing MySQL for WordPress is a constant battle, with plugins and themes not handling the uninstallation process at all, and leaving huge amounts of data inside the wp_options table with autoload = on, for example, or the expensive queries that WordPress itself make — Elasticsearch and object caching are big help on this one in particular.

But if you call yourself a managed WordPress hosting company and your marketing material says your entire stack is optimized for WordPress, you should be held to higher standards than a shared hosting company like HostGator, for example.


If you render everything out and save it in cache, it mostly doesn't matter how terribly your database is performing until you start getting lots of comments.


You will still get a fair amount of cache misses, in which case, everything else in the stack being optimized will save your ass.

I am not saying that implementing page cache is bad — it is essential for WordPress —, just that page cache is not the only thing that matters, but seems to be the go-to solution for performance issues with WordPress, when in reality, you should look at the picture as a whole.


Automattic (the company that develops WordPress) offers a free plugin which, when correctly configured (requires some cooperation from the web server), serves completely static pages for all posts. No PHP code is executed.

https://wordpress.org/plugins/wp-super-cache/


> The median page load was acceptable at 4-6 seconds.

That's not an acceptable server-side response time at all, regardless of how dynamic, or not, a blog post page or index page ought to be.

Even now, I'm seeing 500-600ms+ server-side response time from Europe, and 800+ms in the US.

When did this become "good enough", nevermind "normal"?


4-6 second total page load time. This includes the document, static assets, and JavaScript parsing.

Based on many data points from our monitoring of website performance, this is a very common range.


Just because it is common does not mean it is acceptable. Do you have any idea how ludicrously fast computers are nowadays? Multiple cores, each running at several billion cycles per second and several instructions per cycle. If each cycle were 1 processor-subjective second long, your average desktop processor would experience upwards of 248 years per real-time second.

Frankly, our entire industry should be ashamed that anyone thinks 4-6 seconds is an acceptable amount of time to render a fucking website.


GP is talking about server response time for the base HTML request, saying they see 800ms+. Here are the resource timings I'm seeing for the base HTML (pulled from the HAR of our commercial web monitoring product. Running from Virginia on a 20/5 Mbps connection, latest chrome, desktop user agent and 1080p viewport):

"dns": 158.37, "connect": 328.503, "send": 0.148, "wait": 856.465, "receive": 91.914, "ssl": 175.243

"Wait" is the time between when the last byte of request is sent, and the first byte of the response comes back. This is measured all after the DNS+TCP+TLS stuff has happened, and is basically measuring the latency to the server, the backend processing time, and the latency coming back.

800ms+ is... not good because this site is supposed to be behind a CDN (lower latency) and with a supposedly optimized backend. I also am still seeing the "cf-cache-status: DYNAMIC" response header, so whatever optimization was made didn't stick.

(Also, That connect time is oddly high. 300+ms of which 175ms is the TLS handshake. Something to look at as well.)

FWIW I'm in the web performance industry as well, and Todd is correct, a 4-6 seconds for window.onload is common for sites (not just web apps, but sites). Of course modern site development practices (lazy loading images, deferring/asyncing scripts, font fallbacks, etc) have made using the "onload" time essentially useless as a good metric.

(PS: Todd I'm a big fan of your videos on building Request Metrics)


:D Thanks Billy!!


It's atrociously slow. It makes no sense at all for a blog post.


100% this.

4 to 6 seconds for effectively static content is absolutely insane, and only justifiable for exceptionally rarely accessed dynamic data queries.

It shouldn't matter how "common" this response time is, when you can load a huge static HTML/CSS/PNG site in hundreds of milliseconds.


My internet connection can stream up to 1 GBPS.

Unless the website is literally loading an entire page of High Res images/video content, 4-6 seconds is atrocious


I think the main takeaway, caching, is important. But what's frustrating with Wordpress is that there are many plugins to do caching, and each caching plugin has a million options in it. How they handle images, dynamic content, cache headers, ETags, etc, are often buried deep in submenus.

On top of that, testing caching is challenging - replicating between local, staging, and prod is ultimately a very manual and error prone process, so there's no real way to figure out how to test and if what you're doing is the right thing. Since caching is not an immediate thing (it can take time for a CDN to pick up an asset, for example), it can be unclear if what you've done works, or if you need to wait five minutes and try again.

I wrote a blogpost a few months ago about this and other issues. (https://solitaired.com/why-we-switched-from-wordpress-to-nod...)

Maybe I'm doing it wrong? ¯\_(ツ)_/¯


A Full Page Cache is the way to go, you're going to run into far fewer edge cases and cache bugs. For something simple like a blog, you shouldn't be hitting PHP at all once the cache is warmed, let alone caching queries and snippets and whatnot, bugs waiting to happen.

The strategy is the same even for complex sites, full page cache at the server daemon for almost every page hit unless a user is doing an action that isn't "View this page".

Obviously I'm glossing over cache busting strategies and how you handle dynamic actions like add to carts, sessions etc, but they're all far more simple than rolling your own application level caching strategy.

You will get so much more out of your hosting if using an FPC, which gives you way more headroom for the requests that do need to spin up the application.


When I self-hosted Wordpress sites, we put Varnish on the same server and ran traffic through there. Varnish has a reporting feature that lets you track the hit ratio in near real-time, so we could change a cache setting and see its effect on the ratio pretty quickly. We could purge the cache in whole or in part with the CLI.

We picked whatever the most popular cache plugin was at the time and turned on the default settings, then went from there. Honestly, the difference between any caching strategy and no caching strategy can be huge, especially when you get a surge of traffic.

We put staging behind the cache too, but not local or dev. Staging was just to test that a caching configuration change would not take the site down. A bunch of the tuning was actually done on production and back-ported to staging.

I think a hidden factor here is what people know. If a person/team knows node.js really well and doesn't know Varnish, then of course it's going to seem easier to move to node. That seems pretty common to me; it's more like web development, whereas caching/Varnish seems more on the sysadmin side of things. My team had to learn it, which took a little while.

But in my experience, well-tuned caching in front of WP can scale really well for content that changes rarely like blog posts or articles.

Today we pay WP Engine take care of it all for us and it's money well-spent.


Varnish cache is great. Heck, any kind of caching is great. Even a stupidly simple 5 second cache will be sufficient for handling most levels of traffic.


Ah, Varnish - I have heard so many great things. Not sure why I didn't use it this time around. I think I assumed Cloudfront would take care of it.

My biggest learning was that CDN != caching. While it can be used as such, it's not automatically set up that way out of the box.


You should use cloudfront, cloudflare or some similar service. They take care of caching your assets as well and I think are more trustworthy than some random caching plugin on wordpress.

Edit: I see you mentioning cloudfront on your blog post, what problems did you encounter when using it with wordpress?

Also, any reason for not using sanic, strapi or any of the headless CMS and building from scratch?


For Cloudfront, I eventually got it working. The difficult part was figuring out how to get Wordpress to serve the site when it received the request from Cloudfront. By default, it turned into a weird redirect loop.

I eventually solved it by futzing with all the options.

Re why I didn't use Strapi/the others? Once I switched to node / markdown, I told myself I'd fit in a real CMS, but never got that far researching it. However, I like the options you mentioned. Thanks for sharing!


I have not used cloudfront, but cloudflare significantly increase my TTFB.


100 requests per minute doesn't sound like a lot of traffic. I suppose a blog like theirs should know to not use several plugins that kills a site at 100 req/man's to not sendno-cache graders when behind a reverse proxy.


That's true. With even poor response times of 500 ms, you could do that without any overlap...


Seeing a "My blog crashed under traffic" article in 2020 always makes me wonder what people are thinking using WordPress for a blog. Use a static site generator (I like Lektor but have also heard good things about Zola), deploy wherever (Netlify is great, I'm partial to Neocities), done.

I even made an open source site you can just fork and use in a few seconds:

https://quicksite.stavros.io


Ecosystem (Plugins and Themes).


Step 1 is "reduce dynamic content" but the bottom line is a plug for tens of KBs of JavaScript (per request) to track your users :)

I wonder if this is intentional trolling


It is meaning server-side dynamic content. That JS won't consume any noticeable CPU or RAM resource, it'll just be a static file pretty much always in memory, no templating engine in the web server process, no database hits, etc. All the significant processing load is either client-side or on another server (unless he is hosting his own analytics service), in fact the scripts are probably served by the analytics service servers too so even that miniscule impact is felt elsewhere.


The JS is not dynamic content as far as the WP server is concerned -- it's just a static JS file(s) that's served. The dynamic behavior it has happens (and impacts) the client (and whatever third party service does the tracking, e.g. GA).


3.3kb async loaded JavaScript actually. But the irony is not lost.


Being a native-code game engine developer is useful perspective - having to spit out millions of triangles, pixels, and audio buffers within a 16ms frame budget teaches you just how powerful modern computers really are. You think (and work) in microseconds, where if something takes a millisecond you stop and investigate what the hell is wrong (e.g. an XInput bug in my game Weapon Hacker blocked for 500us every frame).

From that perspective, not being able to generate a blob of basically static HTML text in 600ms (100 requests per minute according to the article) seems insane.


As someone from the other side (I develop websites/web apps and play games sometimes) it's insane how most games simply have horrible UIs and UX, with absolutely zero work done towards making it more accessible and easy to use, while having 100s of employees working on the game itself.

From that perspective, not be able to make a decent UI that more people can use without getting frustrated seems insane.

We all have our focuses in the areas where we can get the most impact. Being able to serve a website in 100ms instead of 200ms simply is not as important as the 100s of other things us web developers have to think about.

Although I do agree with you, the web is bloated right now.


Yeah - performance definitely isn't everything. E.g. a lot of games bury frequent actions under multiple levels of menus, when a little extra coding could provide a shortcut and greatly improve usability.


Web is network. Unmanaged is memory to GPU. Think play as you load with everything read from the network as billions of game designers build levels in real time.


For a web server, it just has to take a URL and some headers arriving from the network card, and reply with some HTML bytes. All the networking of billions of nodes is handled elsewhere.


Ready for another go, eh?


/.


I was very proud back in the day when one of our shared hosting clients got Slashdotted and other than warming the room a bit extra our infrastructure never wavered. The secret at the time - in my opinion - was that we had the database running on a second Dell PowerEdge Server in the the quarter rack instead of all on the same machine. These were PIII 677 Mhz days before multicore.


That's awesome work. I miss the days from the 90s when we were still figuring out the more basic things, i.e. everything - and they were often our own learnings more than refinements on learning about what someone else did and put in the "discovered domain" so as to speak.


I am intrigued at this trend to move toward static content. In 2010 it was LAMPs everywhere, with WordPress being a major product of that time. Now the cool kids are talking about the JAM stack [1], which feels like what AJAX was back then. We knew about it, we just didn't think it was cool because CDN's weren't really "a big thing" yet.

https://jamstack.org/


Reading the Jamstack site and this stuck out to me:

"Higher Security: With server-side processes abstracted into microservice APIs, surface areas for attacks are reduced."

I'm sorry, but how does microservices make something more secure? You've only outsourced your security problems in hopes a 3rd party will resolve it.


Here's an example: you can use tools like wp2static to render your Wordpress site entirely as static pages. This cuts Wordpress out of your deployed software entirely, which can eliminate a vector for a lot of vulnerabilities.

JAMstack isn't a move to microservices. In some places, it's about eliminating services entirely, or severely reducing their scope.


What stood out to me is that even on their "What is Jamstack" page, they never explain what Jamstack actually is.

"Jamstack is an architecture designed to make the web faster, more secure, and easier to scale. It builds on many of the tools and workflows which developers love, and which bring maximum productivity." W.T.F.

And yeah. Microservices ~= webserver + backend


Since the post mentions already relying on Cloudflare for help with traffic, I enabled Cloudflare's new Automatic Platform Optimization [0] on a client's website as a test last week. Thought it would be another overhyped WP caching solution but it truly feels magical. I believe it's powered by CF Workers and stores the pure HTML in KV on the backend, but all of that is handled for you and automagically updated/purged by the plugin on any site change.

Highly recommend trying it. I'm seeing the vast majority of visits now are not hitting the origin server at all — for assets or the page itself. At least it's a good stopgap until we can convince everyone to move 100% static...one can dream, right?

[0] https://blog.cloudflare.com/automatic-platform-optimizations...


There is always a trade-off between static and dynamic content. Serving static articles out of a database is always going to have vastly lower performance than generating the page at the time of authorship and setting long cache expiry times.

The problem with long cache times is that then some readers might see an out-of-date version of your page. I would argue that that is a small inconvenience to avoid no readers being about to see your article at all.

I know there are a lot of WordPress plugins that effectively generate the pages ahead of time and just serve those. I think perhaps that should be the default way WordPress works, at least for single articles. There is little gain in generating the whole page on every hit and everything to lose.


I like varnish for this. You can set a small cache TTL (like 30 seconds) so it only let’s one request through per TTL (30 seconds). And with varnish even if you get 10k hits right when the cache is stale, it can be configured to only process the first request and serve the stale cache to the rest.

Using a TTL isn’t great if you have a long tail of content, but pretty awesome if it’s a limited surface area that’s taking the load even if your dynamic backend is something heavy like Wordpress.


> The problem with long cache times is that then some readers might see an out-of-date version of your page. I would argue that that is a small inconvenience to avoid no readers being about to see your article at all.

Aren't ETags exactly meant for that though?

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ET...


ETags are really aimed at a different problem, where an end-user/client wants to refresh content that they already have.

They can help let a intermediate cache know that content has been updated, but the cache has to be configured to ask for it.

Plus, for dynamic content even the cost of calculating the Etag is not zero.


Why is it written in third person?


It's not written by David Walsh, but by Todd Gardner. This isn't immediately noticeable so I felt quite confused as well.


It is also an ad for Request Metrics it seems. First thought my blockers let the ad through, then noticed the beaver in the other screenshot (reading from mobile now, so cannot view source). I've become so used to never see an ad that the animated ad image was a real surprise to see.


It's not immediately noticeable that it really is an ad for Request Metrics, but looking at their Twitter page makes it painfully obvious that it's a sponsored post:

https://twitter.com/requestmetrics/status/132144452901530010...

Edit: actually, seems the author of this post also works for/owns trackjs.com, who owns requestmetrics.

Sad to see David Walsh' blog this way, of hidden sponsored posts. Really wish people could stick to ad-free principles better these days.


How exactly did this get on the front page? The webpage itself is a mess and the article contains nothing of any substance whatsoever.


Especially given the domain is davidwalsh.name, and the article isn't on a subdomain or subdirectory or anything, this is all the more confusing.


This is a better answer than mine. I had missed that detail too. David Walsh Blog is still a publication name separate from David Walsh though.


Ok, cool, I stopped reading because i felt confused


By Todd Gardner on October 28, 2020


"By 7:50 AM, traffic hit the limit of the technology, around 100 page views per minute"

(Tone note: Technical discussion about the modern era of development that just happens to be prompted by this article, not a criticism of the targeted site. I've written that sort of website myself enough times!)

Yeowch. That's barely faster than a page view per second.

I have noticed in several sites (generally APIs rather than end-user sites but the same principles hold) I've built lately that as nice as databases can be, there's a lot of places where things are coded to do a query per page view for things that just have no reason to be doing a query per page view. Even a "no sql" database can slow you down a lot vs. an in-process memory structure. I took one site from being able to serve a few hundred per second to tens of thousands per second by simply taking the relevant DB tables and slurping them wholesale into memory. Whenever someone makes a change to the underlying tables, a "several times a week" operation, I simply slurp the entire database tables in again from scratch. Slurping in the entire DB takes ~.25 seconds for all of its tens of thousands of rows on a low-end RDS and a low-end EC2 instance. Precomputing the answers to "all the questions we saw last hour" (as this is a service queried hourly by a lot of machines) takes another half a second or so. During the second this is happening it's fine to serve stale results from the previous version of the memory contents.

My point is I see a lot of residual code and frameworks and habits from an era that come from an attitude of 5 megabytes being a lot of stuff, but it really isn't anymore. Obviously you can't do that to thousands of things without some issue, but almost every application has these little tables like a sidebar or the list of types of X or all kinds of other things where you're better off just slurping the table into RAM and slurping the table into RAM again if there are any changes rather than constantly hitting a network database over and over again, because even if it's a completely cached query it's still vastly more load on your systems than a hash table lookup. (There is a bit of trickiness around making sure you detect changes, but one nice thing about "just reload it all from scratch again" is it's feasible. "Update just the changes" always turns into a problem because of the way an error, once made, echos forever, but "just reload it all from scratch" is a feasible level of complexity.)

I also blame the "shared-nothing" architecture for hanging on longer than it needed to. It is OK to use the architectural patterns without literally throwing everything away on every web request. I think what I describe above can still just be considered a glorified DB cache if you do it correctly, which is fine to "share". There's a ton of websites like this in the world where every page load makes dozens or hundreds of DB queries that don't change their results more than "several times a day" and as a result are very slow for no good reason.

(Many of these websites could also just run the queries every hour and serve the results with little to no loss in most cases. You want your "published stories" to update immediately, but you probably don't need adding a site to the sidebar to be reflected instantly, etc.)


That's barely faster than a page view per second.

Your server needs to be able to cope with the peaks, not the averages.

Thinking of "events per long time period" as "an average of events per small time period" is an easy shortcut to overloading your servers.


I never understood why blogs, which update rarely (say, at most once per day), are not always served as rendered, static html. An nginx instance on a tiny server can easily serve thousands of html pages per second, thus avoiding all hug-of-death from small sites like hackernews.


Be static my friend.


Or if you are dynamic, profile your site to know how many queries are running, what queries actually NEED to run, and weed out anything that is not necessary for post.

I have a blog written in my "micro" framework [1] that doesn't use any caching, and is hooked up to MongoDB, it has handled being #1 a few times without falling over, or even slowing down.

The secret? 1 query per page that pulls all the relevant information into the view model. Also has a hidden benefit that I can count my pageviews by just looking at the number of queries against mongo for the day.

1: https://github.com/jeremyaboyd/simplemvcjs


I publish my blog with make(1). Highly recommended.


Ditto. I’m surprised at how convoluted people make publishing a blog to be.


> David’s site uses WordPress. It serves most content from a MySQL database, which is a well-known performance limitation.

MySQL has some performance quirks but I wouldn't say that it's outright 'a well-known performance limitation'--is this particular within Wordpress circles?


I work for a WordPress agency. This is mostly due to poor optimization of the website owner/host. It's almost never worth it to have a fully dynamic page generated for every visit. A couple short lived caches (60 seconds) would likely have kept the site online, as well as properly set headers.

This honestly seems like a rookie configuration mistake that was overlooked during some migration between webhosts. At this point, tuning WordPress is pretty well known. The reason MySQL gets blamed is due to WP's poor DB schema. Very few indexes and the data is not normalized.

On small sites with few comments/posts, it's never a problem, but at scale, you'll see issues start to popup as the DB has to scan entire tables for each page load. This largely drove the rise of comment services like Disqus and FB comments a decade ago. It seems in recent years a lot of people have opted to just not have blog comments, instead driving discussion to dedicated forums or social media groups.


Thanks for the clear explanation. It's good to know the sources (and history) of the issues that we work-around.


I was curious what the maximum load would be for my own personal blog. Using ApacheBench, my static site on cheap commodity hardware can easily handle 6000 page views per minute. Seems like a pretty big performance advantage for the cost of just adding a step to render static HTML.


Caching is not only magic


Nginx+static content = high performance. Add cloudflare and of course the proper settings and youll be hard pressed to outrun a cheap vps.


I just setup a new Wordpress blog about the startup life on a $10 AWS Lightsail. It uses WP Fastest Cache and Cloudflare in front. https://thedevfounder.com

I wonder if it would survive an HN hug. I would think it would, but will have to wait and hopefully see someday. I spent too much time setting up a static blog, that I just went back to WP so I could focus on the content and not the setup.


in my case it’s static content + cloudfront. I have yet to see it not being able to handle the traffic (we’re talking thousands of requests per second at peak). This does not surprise me and i think the same thing could be accomplished with nginx on a single box.


The existing load times are unacceptable as is. Of course that'll fall apart under load.


tldr: The php process hit the max connections inside the database pool, which caused blocking within the php threads, which caused a thundering herd problem, and without correct tracing inside the wordpress php files the authors didn't know this and assumed that there wasn't enough caching on the front-end.

"at least 2 database-touching requests for every person reading the post" should never be a blocker for 2 requests per second unless you truly do not understand your application infrastructure at all.

on edit: The better solution here would be to figure out where the MySQL server was being put under load -- hint, it probably was 99% idle, because unless that sidebar file was making an unindexed query there's no way things would take 500ms -- and then realize that you need to bump up the number of database connections in your php pool to be MYSQL_MAX_CONNECTIONS so that php isn't blocked on obtaining a new connection. Problem solved.


tldr - the main key is that the blog was still sending "do not cache" headers, and cloudflare was respecting that, so no caching was actually happening.


It continued with:

> This site was set up to be cached by Cloudflare at one point, but over time things changed. Somewhere along the line from a WordPress plugin or hosting upgrade, the cache-control headers were changed, and the caching was broken.

I find the assumption that it once worked a bit optimistic. Could be true, but people do stuff that doesn't actually work all the time. Easy for me to see someone setting up cloudflare without either verifying that it has the desired effects of reducing latency and surviving load or digging into details like request rate to the backend and cache-control headers.

I'm mildly curious why each request had 500ms latency at best. I know PHP isn't the fastest out there, and talking to MySQL on every request doesn't help, but still that's pretty slow. Also, no parallelism? That's a bit sad.

If the content isn't truly dynamic, I'd recommend to anyone just using a static website generator like hugo. A cheap VM can easily do thousands of queries per second without requiring cloudflare.


Latency is likely from the webhost side. I've seen expensive managed WP hosts that still respond slowly because they know the average customer won't blame them, but rather their own internet connection.

I work with a large ecommerce client that uses WP and our TTFB on an extremely dynamic page with user specific products and hitting external micro-services is still about 1 second. WordPress unfortunately makes it really easy to become very slow unless you diligently stay on top of things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: