Hosting is an interesting business. It has come a long way from the free website days of Geocities and the like. Speed, reliability, and features have grow far beyond what they were only a few short years ago. What you can now buy for under $10 a month is 10x as powerful as what you could buy 5 years ago for the same money. That is a good thing.
Hosting also has some short comings that still haven’t been addressed that hamper many individuals and companies with lower priced shared hosting. I would like to discuss some of these short comings and how I believe they can be fixed and show how we think we have finally been able to achieve shared hosting nirvana
Listed below are the well known and almost impossible problems to solve with regards to shared hosting –
1) Resource Allocation – Because many users are on a single system it is hard to allocate resources properly. You may have a very underutilized server (Resources going to waste) or an overcrowded server that can become bogged down with overuse. It is impossible to determine the usage of a particular user in advance. This causes unreliability on a server because a server can spike out of control at any time. Common problems are too much disk i/o consumed in bursts by particular users, extreme short or long term memory usage by a particular user, or spiking or prolonged cpu usage by a small group of users on a particular server.
** What we can do about it –
To us it all comes down to instantaneous real time tracking (About every 15-30 milliseconds in our case). Advances in general linux kernel tracking and our own propriety tracking have finally allowed us to know which users need the most resources. We have then tried to automate as much of the process of allocating free resources in real time to these users as possible. We are constantly updating these automated tools to make the process as seamless as possible.
2) Tracking User Consumed Resources – You can’t “blame” users or even monitor many user activities (Cpu use, or disk i/o usage) on shared servers because tracking user resources means you have to know what user ran what process. The problem comes into play when you have major applications that don’t run as a particular user and instead are run as their own process with no “real” user tracking. Let me give you an example of the two biggest problem applications in this area – MySQL, and Apache. MySQL consumes an enormous amount of CPU and Disk I/O resources, but it normally runs as the “mysql” user. This makes it very difficult to track resources used by a particular user. Built in tools such as slow query logs, etc are extremely inaccurate in measuring disk i/o and CPU usage by user. The other major culprit is Apache. Apache runs as a separate user as well (At least in our case, and most web hosts have a similar setup). Apache spawned script process such as PHP, Perl, etc can easily be made to run as a specific user for tracking purposes, but the Apache processes themselves and the corresponding cpu and I/O overhead is never attributed to a single user. There are many applications that fall into this category, and all of them make tracking inaccurate and problematic for hosting companies.
** What we can do about it –
I have spent thousands of hours over the last 3 years working personally on this problem and I am VERY happy to report that we have nearly solved this problem (MANY thanks to kernel developers all over the world that helped out as well as the talented developers in house!!). We have spent considerable time and money modifying the linux kernel and userspace applications (MySQL, Apache, etc) to report exactly which user is responsible for cpu and i/o usage in real time. Lets give an example – Lets say we have user “matt” that does a MySQL query that take 2 minutes to complete (clock time) and 90 seconds of real CPU time to complete (Actually number of CPU seconds required to complete the query). When MyQL passes the query to specific thread to be serviced we start tracking for that particular user the EXACT cpu time that was used, and the exact number of system reads and writes as well as device specific reads and writes. We can use this to track and slow down the extremely heavy users in real time so that the server is calm and available for everyone to use. We are in live testing right now on several boxes and hope to have the CPU portion of this rolled everywhere in the next 3-4 weeks. The disk I/O portion of our code is already live on 90% of our system and will be live on 100% of our servers in the next week. The importance of this can’t be overstated. This is what shared hosting has needed forever, and what will allow it to compete with and in many cases top VPS in performance while maintaining stability in the system.
3) Immediate action on policy enforcement – This is a BIG deal with shared hosting! When a specific user “violates” a policy like excessive CPU usage or disproportionate disk I/O or memory usage 99.9% of all shared hosting companies will try and alleviate the problem by killing processes or banning a user AFTER the damage has already been done. It does no good to ban someone after they have consumed so much CPU that the server becomes sluggish. The sluggishness or downtime has already happened at that point. Most hosting companies have a very difficult time every determining where these cpu and I/O abuses are coming from let alone mitigating the problem before it happens. Virtually NO SHARED HOSTING COMPANIES have good options to actively slow down cpu usage – They usually just stop the offending processes (Not a good option), or kill processes consuming too much disk I/O – not a good option either.
** What we can do about it –
As mentioned above in item #2 we can now track and monitor cpu and disk I/O usage in real time for all our users. Based on this information we can now do what no other shared hosting company has ever been able to do effectively. We can limit disk I/O activity in real time and limit CPU activity in real time for all our users. This allows us to mitigate the effects of sudden spikes in usage that would normally affect all other users. Here is a good example to illustrate this point – Lets say we have 100 users on a server and that 99% of the time everything runs smoothly but one day one of the users makes it to the front page of digg.com. One of two things are going to happen. Either the user that is causing the excessive load is going to be shut down or the server is going to be sluggish or possibly down until traffic subsides to that site. We have been in this position many times in the past and its never good for anyone. The disk I/O portion of that problem is now solved for us, and the CPU issue should be ready in the next few weeks. Instead of shutting the user down we can now “contain” them as if they were on a VPS. We can isolate them from other users so they don’t cause problems but still allow them to use any extra CPU cycles or I/O operations that are available.
**What does this mean?
What does this really mean? It means we will very soon be able to offer the VPS experience for less money and less hassle than every other VPS product out there, and that our shared hosting product is about to become a LOT more stable than everybody else out there. My opinion is that most VPS users really don’t need or want root access that requires their own time for security updates, Cpanel updates, etc. They simply want a contained environment and guaranteed resource allocation. We will be able to offer guaranteed resources just like a VPS or dedicated server solution without requiring any changes for our users. I am very excited to see all this materialize as this has been my pet project for several years. If you have made it this far in the blog entry I congratulate you for your tenaciousness in plowing through my technical ramblings!
Matt Heaton / President Bluehost.com – Hostmonster.com