We have worked so much lately on many internals to our hosting platform. Sometimes I worry that our customers think we are standing still when they don’t see a lot of outward/customer facing changes. However, we have been feverishly working on improving the overall stability and speed of our platform.
We have nailed down CPU overages, memory overages, and disk i/o bottlenecks (i/o was the hardest one btw). We have delved deep into the dark art of linux process scheduling and have come away mostly unscathed
All of this backend work has made a tremendous difference to our customers, but now we have something new to announce. MySQL process scheduling protection. I know what you are thinking… That doesn’t sound exciting at all, but to us computer nerds its a big deal!
Basically a single rouge php/perl/ruby script utilizing MySQL can consume all the resources on a server and never even show up as using much CPU at all. I have written test scripts that consume less than 1% of the total “cpu time” yet leave a machine with 16 cores with 0% idle time on all cpu/cores and thousands of backed up processes. I can replicate this on virtually every hosting company that I have test accounts on (And I have a LOT of test accounts on competitor’s servers). This isn’t a rare thing that happens. Its VERY common, and many popular plugins for WordPress and phpBB cause this to happen very frequently.
Surely something like this wouldn’t be allowed to happen in a mature multiuser environment such as Linux! When I first discovered this bug I didn’t think it was wide spread. After running MANY tests and spending a lot of near sleepless nights proving my theories I discovered that it was happening hundreds of times per hour for short periods of time across all our servers. This causes short delays (Usually 5-60 seconds) that are very very difficult to track down after the fact.
The issue really is with MySQL itself, but it also is because of a serious design flaw in the current linux process scheduler. I am intentionally being vague on the specifics of the problem because I feel the fix we have developed will give us a substantial competitive advantage. Without question we will at some point in the future release this code to the community, but for now we will continue to use it in house to the benefit of our loyal customer base.
This speed and stability fix will impact directly (In a negative way – meaning that coding problems will have to be fixed on the customers side before their sites become usable) about 400 customers out of 547,000 (At the time of this blog post). Indirectly, everyone else will benefit greatly because it literally makes the difference for many of our servers from being 0% idle (Totally overworked) and 50% idle.
It is not yet live on our production environment (Except in a few controlled cases), but will be completely live on our system by approximately Wednesday of next week (March 31st, 2010).
Don’t expect a huge change, but certainly expect and demand stability from us as your hosting provider. Just remember that when you don’t have slowdowns and problems it’s NOT by chance. We kill ourselves trying to make Bluehost/Hostmonster the best shared hosting on the planet. We think we have succeeded, but if you don’t think so then the job we are doing isn’t good enough. Either way we will continue to do our best to solve the biggest issues as we see them. Thanks again for your business and your encouragement.
Matt Heaton / President Bluehost.com