When things are good they are great and when they are bad they are REALLY bad. The last 2 months we have been REALLY bad. I know many potential clients read this blog, but I have to be honest with our existing clients that we have been less than great the last 2 months and the problems lie squarely on my shoulders.
Today we had downtime off and on from around 10am-3pm. It was our garbage heap we call a router that did it to us again. We have had a list of things go wrong in trying to get our new router up and going. We have the new router in our datacenter but the fan tray is not the highspeed fan tray we needed and so it is holding us back. We won’t be able to put our new router into production until Friday night at around midnight. It was supposed to be up and running this Monday but evidently next day air means 4 days later to some people.
Here is the short list of majors problems that we have had to deal with in the last 2 months and what we are doing about these problems.
Problem – Major linux kernels problems with Redhat Enterprise 4. This affected about 40 servers. Solution – Finally built our own custom kernel that solved our multitude of issues.
Problem – Two seperate power outages that affected about half of our users. Solution – So far have moved about 80% of our users to a facility that has enough UPS and generators to alleviate the problem. We are signing a lease next week on a 5000 ft data center that will allow us to grow as well as provide colocation services for many businesses. The whole move will take around 45 days to complete.
Problem – Multiple outages affecting all our users. Slow connection speeds and dropped packets. – Solution – A new cisco 6509 router with a sup720-3bxl card to solve our routing issues. This is 95% of the problems that Bluehost has been experiencing. We have been trying to solve the router problem for a couple of months. I think we finally have a handle on this hairy problem!
Problem – APF firewall system running on all users boxes. This firewall software has been randomly blocking data ports for users across all our systems. There was no rhyme or reason to why this was happening and thus very difficult to resolve. Solution – Finally gave up on APF. It is about the worst software I have ever used. We have installed a new software solution on a dozen servers and have been testing it. It seems to have completely fixed our issues. It is being rolled out to all servers on Monday.
All of these issues have made Bluehost’s reliability in the last 2 months completely unacceptable. As you can see from above we know of the issues and have been working nonstop the last several weeks to get things back to how they should be. Thanks for those that have been patient. For those that haven’t been so patient I don’t blame you one bit! We will have things back to normal in a few days.
Matt Heaton / President Bluehost.com