Ok, I’ve done enough personal blog posts that its time to get technical again. Web hosting is a changing animal. Customer needs are becoming more and more demanding as businesses rely on the web to provide critical infrastructure for their success. In order to facilitate these customer needs hosting companies require an EXACT method to track resources needed and consumed by every customer in realtime and historically. Currently these tools and tracking are sorely lacking.
Most web hosting companies use Unix or Linux (many varieties) for the servers that provide service for their customers. Linux is rising star and is what we use for 100% of our hosting services. Linux is GREAT! It is fast, it is stable, and it is only getting better. However, until very recently Linux has not provided the necessary kernel (heart of the operating system) enhancements to effectively track what we need to both guarantee resources for sites that are starved for cpu, disk I/O, etc, and to limit and block the cpu and I/O hogs that are causing problems for everyone else.
I talk with many hosting companies every week and try and assess what others are doing to meet these challenges. In many cases companies are just throwing more servers at a problem without dealing with the underlying cause. This was how it had to be because Linux simply didn’t provide the necessary information. Let me be specific so you know what I am talking about. Below are a list of things that we must have in order to provide guaranteed resources in a “shared” hosting environment. Some of these requirements can be obtained but are difficult to get at and others the Linux kernel either doesn’t report or give access to or is inaccurate to the point of being useless in performance tuning.
*DISK IO REQUIREMENTS**
- Byte counts per process and/or user and group to an IO device in realtime and historically (Taskstats in the kernel does this – FINALLY!!).
- Byte counts including or excluding cached hits to any physical device.
- Disk accesses per process and user per physical IO device tracked in realtime and historically – This is EXTREMELY important in a shared hosting environment. As I talk with other hosting companies this is almost never even looked at and in my view is the #1 performance bottleneck in hosting by a long shot.
- Exact stats of journaled filesystem flushing mechanisms such as kjournald, pdflush, etc.
**CPU REQUIREMENTS**
- Better hooks in the Linux kernel for managing/tracking and if necessary blocking process. “nice”ing processes is like hitting an ant with a sledgehammer. It works but it is not the elegant solution that companies need.
- Ability to reduce CPU and I/O block devices in tandem. The CSF I/O scheduler that is the current default I/O scheduler in the Linux kernel (2.6.17 and above I believe) allows I/O priority scheduling. “ionice” and other software is available to do this scheduling, but isn’t set to work with the CPU so I/O and CPU are difficult to reduce in concert. CPU and I/O don’t increase or decrease in a fixed ratio together, but viewing an individual process with a goal to reduce CPU without controlling the I/O is a waste of time in my opinion.
- Better user space application threading is also a need that all hosting companies have. As multi-core CPUs are now the norm in virtually all servers and 8 core CPUs are on the way (4 core is available now in Intel, and AMD Barcelona 4 core CPUs expected in 2nd half of 2007) more “web hosting” apps needs to be multi-threaded to make better use of these CPUs/cores. Cpanel (Our control panel) still uses many older single threaded apps because it is difficult to upgrade. The company that created Cpanel needs to take this to heart and start improving performance for the real world instead of just what works in their lab.
I am writing this blog entry for a couple of reasons. First, so you know that the CEO of Bluehost isn’t just a business/marketing entity at Bluehost. I understand the technical challenges that hosting poses. Second, I want you to know that we are actively working on and implementing changes that will shield users from other customers CPU and I/O demands. I know customers hate to see “CPU Quota Exceeded” errors. I know sites are sometimes slow or I/O is backed up. These issues are on the top of our list and solutions are coming that will be exclusive to Bluehost. We have some of the most talented and bright developers that have been challenged to solve these issues. Real solutions are coming and performance will continue to increase!
Thanks,
Matt Heaton / President Bluehost.com