Well, in the last week we had a major DDOS (Distributed denial of service attack). It hurt our network, and caused 2 outages. One for about 20 minutes and one for 45 minutes. That amount of downtime is absolutely unacceptable from a customers point of view. I totally agree!
I sent out a post to all our users letting them know what happened and reiterated our commitment to solving the problems that we are having. We received many positive comments saying how our customers appreciated us telling them what happened, and passing on their understanding for our situation. We also received a few emails stating that we had lost our focus and were growing at the expense of current customer satisfaction. While these emails were a very small percentage of the whole I take them very seriously. Let me address some of the issues we have and what we are going to do about it.
1 – DDOS attacks. Without going into too much detail we are giving this issue our FULL attention. While nothing can stop all DDOS attacks there are many things we can do to beef up our network beyond what we currently have. Don’t get me wrong, we have one of the most robust networks out there, but there is always more than can be done. We have already ordered some pretty cool new hardware that will help us get there. We are taking this seriously, and will do what we can to prevent further attacks.
2 – Individual customer overages – We run a shared environment. This means that you are on a server with many hundreds of other users. You have to be a good neighbor or everyone has a poor experience. We have been struggling with the balancing act of giving our users what they want and keeping their usage under control so that the environment we run is fast for everyone. In the future we will limit the cpu usage for the 1% that eat up the resources of the other 99%. We have been doing this all along, but have given more than we probably should to that 1%. This will make some customers angry, but for the vast majority it will mean improved uptime and greater speed for their sites.
3 – Proactive administration vs reactive administration – Sometimes we find ourselves in the reactive mode. Fix it when their is a problem. This isn’t right. We have set concrete plans in place that will make it so that each time a server goes down we have the information to know exactly what caused it and how to fix it so next time the same thing doesn’t happen. Many large hosts get caught up in the “reset the server” mode, and don’t really fix the problem for the long term. We will strive to fix it for the long term.
We want Bluehost to be not only the best hosting with regards to plans and service, but from a reliability standpoint as well. I would be interested to get your comments over the next month or so as we implement these new ideas.
Matt Heaton / President Bluehost.com