A few months back we launched a site redesign/redevelopment project for a client, and made a simple mistake that had some interesting ramifications. It’s worth posting here so others don’t make the same mistake.
What Happened
When redeveloping the site, we moved the WordPress instance from the web root:
/public_html/wp-config.php
/public_html/wp-includes/
/public_html/…
to a subdirectory within the web root:
/public_html/wordpress/wp-config.php
/public_html/wordpress/wp-includes/
/public_html/…
I consider this to be a best practice for WordPress powered sites, as it makes version upgrades a bit easier (among other things).1 Some plugins don’t work well with symlinked “plugins” directories, but those generally have easy fixes (use ABSPATH, not dirname() in your plugin).
These changes were made and thoroughly tested on our dev server with historical data. We then made the changes in a staging environment under a beta.example.com style domain name before pushing the changes live. In short, we were pretty careful and tested pretty well.
The Result
When we finally pushed everything live, the production server was brought to its knees. Yikes!
A little poking around, some bug reports, and very shortly thereafter we figured out the problem. We had failed to set up a rewrite rule to map content from:
/wp-content/uploads/
to:
/wordpress/wp-content/uploads/
where it was now located.
An omission like this wouldn’t have had much effect on server load in some site configurations, but one of the wonderful features of WordPress – its lovely permalinks – has a hidden caveat that can be exposed if someone makes a configuration mistake (like we did).
WordPress’s permalink system basically works like this:
- See if a file exists in the location that was requested; if the file exists, serve it. This is how images, media, non-WordPress files, etc. are served without conflicting with WordPress.
- If no file is found at the location that was requested, then pass the URL to WordPress and see if WordPress can figure out what to show.
It’s a really elegant system and works very well. However, it also means that 404 requests – http requests that result in a “file not found” message – have a much higher impact on the server than a traditional 404 request does. For every 404, the server instantiates WordPress, does some database work to try to see if it can figure out what to serve, etc.
When we didn’t set up proper redirects for the content in “uploads”, we basically increased the server load by a factor of 20. Instead of a single request going to WordPress and 20 requests serving static content by Apache, all 21 requests were being sent to WordPress.
The increased load on the server had some… adverse effects on performance. Yeah.
Why We Missed It
So this should be a pretty easy thing to catch, right? If the URLs to the images are broken, we’d all be seeing a bunch of broken images all over the place in our testing – right?
Not exactly, and for two different reasons:
- When we tested on the production server using the beta.example.com hostname, the content was still pointing to example.com and the live production server was dutifully serving the images. It wasn’t until we pushed the changes live that the images were no longer at the previous URLs.
- Even when the images were no longer in the proper place, our browsers that we were testing in were showing the images properly. This was due to longer expires headers we had just implemented for media on the server in order to reduce overall server load. In casual testing, everything looked to us to be working properly.
Once we tested in browsers that didn’t have the images cached, we immediately saw the problem.
Easy Fix
Luckily, this is a really easy problem to fix. A simple mod_rewrite rule (placed before the standard WordPress rewrite block) fixed everything right up:
RewriteRule ^wp-content/uploads/(.*)$ /wordpress/wp-content/uploads/$1 [L,R=301]
(alternate code, improvements to syntax welcome)
With this in place, the old URLs were redirected to the new location and no longer were spinning up WordPress on every image request.
Hopefully this will be useful to people who might be doing something similar, or who have made the same mistake and are looking to recover from it. 🙂
- Corey has a nice writeup on the file structure he uses. [back]
Interesting observation about the 404 and load relationship.
On a side note, wouldn’t have been easier if you just modified the virtual host configuration of the webserver to make public_html/wordpress the new document root, rather than publishing the blog in a subdirectory of the main domain and have to remap everything ?
If the subdirectory installation is just to easen the upgrade (I just use svn) maybe it makes sense, oh well.
Nice catch, and a very hard on to detect, too.
I habitually skim over stray (outdated) requests in the site stats and access logs, whenever I move a site or change a site’s structure – but as WP camouflages all 404s in this case you wouldn’t even have a chance by eagle-eyeing into Apache’s error_log.
Would you suggest any plugin to record 404ish requests inside of WP?
@Robert: I use the “Redirection” plugin, which can log 404s, as well as manage any redirections you might need.
Alex, I’m bookmarking this explanation. I did *not* realize that urls were being passed to WordPress to resolve in such a manner. I figured it was one pass, then straight to 404. (and while being far from a programmer, I’m far from a newbie too.)
This is significant, because this may explain the performance disparity that some of the more competitive code-jockeys report when comparing WordPress to Moveable Type. “My server goes haywire, WordPress is teh suck, PHP is a crappy language”, blah-blah-yada-yada.
It might simply be a case of a folder configuration that is causing some DBs to get hammered, while others glide right through. I never would have looked at such a thing from an optimization standpoint.
Thanks!
Alex, this is GREAT; believe it or not this is the biggest load problem i have on my server and i really have no idea how to solve it 🙂
Redirection sounds nice. I wrote a plugin called 404 Notifier that does 404 logging and alerting as well.
Typically when I move a site, I do a find + replace on the SQL database (new url for old url), the main reason for which being attachment urls and intra-site links. I had some experiences like yours early on, and that seemed like the more sustainable solution.
That’s a good way to do it as well, but has some issues too. For example, your testing on beta.example.com would result in broken images everywhere unless you duplicated the uploads directory into the “new” location.
I’ve run into that issue before, but I handled it brute force rather than having .htaccess help me out. Nice.
Alex, Adam raises a good point. Now that the site is no longer just a test, it may be a wise idea to run a quick DB query to update all the site.com/wp-content/ URLs to reflect the site.com/wordpress/wp-content/ URLs throughout the content rather than having to rely on .htaccess — it seems like the proper thing to do.
In any event, nice reminder. 🙂
Changing the URLs is definitely a good idea, but the rewrite rule is still necessary because the old image URLs still exist on the web in links, search results and archives, and need to be redirected accordingly.
Check out the new wp-content and config file path features in 2.6 that allow you to make it a clean checkout! It’s the new black.
This fix would be very useful for our prcboardexamresults.com website. If it reaches only 10K UVs or more, a dramatic and unexpected increase of server load suddenly occurs and this causes our website to crash. I should check and recheck the points that you’ve given up there 😉
Thanks for this very helpful article alex! 🙂
@Matt: sorry, but where are these file paths in the config file in 2.6? I can’t find them and I downloaded twice from wordpress.org…
this fix helps a lot for our site. Thank you so much!
Alex: would a standard Apache “Redirect” be better for load? That way no regex/mod_rewrite.
SL: if WordPress can’t find
wp-config.php
, then it looks up one directory. Set the path towp-content
via theWP_CONTENT_DIR
constant, like so:define( 'WP_CONTENT_DIR', ABSPATH . '../wp-content' );
define( 'WP_CONTENT_URL', 'http://yoursite.com/wp-content');
I don’t see how to avoid the regex unless you had a Redirect line for every file in the wp-content/uploads directory (which I think would be quite a bit slower with thousands of files).
This is one reason why you use something like APC or eAccelerator: so your server won’t go down while you work out problems like this. 🙂
The load on the server from this is primarily in SQL queries. APC or eAccelerator might help, but I don’t think they are a magic bullet here.
[…] was a plugin called redirection (I wanted to try cleaning up 404 errors slowing down the server.) that started erroring out after an update. since it gave no warning it took a bit before I […]
[…] todas as páginas geradas pelo WordPress são dinâmicas, inclusive estas de erro 404. Após ler este artigo, é bom você incluir no seu checklist de eventuais problemas, os erros […]
Alex
I took your advice and moved my entire site into a subdirectory called ‘blog’ , but it’s broken various bits of my install.
My theme editor no longer works, it just says ‘ File not found, please try again. Merci! ‘ and on te login page, there’s a whacking great pink box that just says object in it.
You’re the only person I’ve seen do this, so I figured I’d ask 😛
Matt, this is great, but unfortunately many existing plugins don’t work properly with the new settings.
hello Alex,
I had to do it the other way round, I wanted to move my blog from a subdirectory called wordpress to the root.
I copied everything, then with phpmyadmin exchanged all occurances of h**p://mydomain.com/wordpress with h**p://mydomain.com and the same for the absolute paths like exchange var/www/web6/web/wordpress with /var/www/web6/web and now ALMOST everything works, except that I have problems displaying some posts from certai ncategories AND when any page of this blog is accessed even if I put it into maintenance mode and jsut browse through the backend, the spu load spikes to 80% 🙁
I don’t think a redirect would do any good as this happens even when I surf the backend.
Any hints/clues, please?
i run to the same problem right now and .htaccess also will happen the same way u described
[…] 404s and WordPress Server Load alexking org Posted by root 3 days ago (https://alexking.org) I consider this to be a best practice for wordpress powered sites it the new black matt this is great but unfortunately many existing plugins lebseo web design amp internet marketing company in lebanon adds this comment Discuss | Bury | News | 404s and WordPress Server Load alexking org […]
yes! thank you man, the htaccess is works, your tutorial resolved my problem
[…] vá» lá»—i 404 và cách WordPress xá» lÃ, chúng ta nên Ä‘á»c bà i viết cá»§a AlexKing (tác giả cá»§a nhiá»u plugin nổi tiếng, trong đó có Popularity, Twitter Tools, Mobile […]
[…] vá» lá»—i 404 và cách WordPress xá» lÃ, chúng ta nên Ä‘á»c bà i viết cá»§a AlexKing (tác giả cá»§a nhiá»u plugin nổi tiếng, trong đó có Popularity, Twitter Tools, Mobile […]
Thanks for this post – it opened my eyes WHY my serverload
goes to 160 (yes, 160) and the server crashes. Sadly a simple
redirect won’t work because I converted a huge website to wordpress
and it is too complex. Does anyone know how to stop this wordpress
behaviour when permalinks are activated? So that people are sent to
404 directly? I think this is something many people may
have.