Web hosting requires massive amounts of storage to satisfy customers needs. This ever increasing demand for storage is backed by many different connected storage paradigms including nas, san or local sas or sata and so forth. This data is normally carved up into differing types of raid arrays. I don’t wish to discuss different raid theory. Instead I feel compelled to express my extreme frustration with certain raid controller manufacturers and to point out what I feel are deficiencies in their offerings and technical transparency.
3ware is the primary supplier for the raid controllers we use. While I have had several complaints with them over the last several years their products have been useable, albeit frustrating and painful to manage in large quantities. I don’t wish to single them out, however over the past 2 weeks their product and misinformation about their products has caused us and our users inexcusable and painful downtime. I wish to personally apologize to all users on bluehost box500-503 and Hostmonster users on servers host300-host303. The huge downtime can be directly attributed to 3ware and their lack of transparency with regard to their controllers and limitations.
Below are several complaints I have against virtually every raid controller manufacturer.
1) Almost every benchmark distributed by raid controller manufacturers shows only raid 5 and raid 6 sequential reads and write I/O benchmarks. This is not representative of almost any workload in todays computing environment. This would be like saying a Chevy Suburban gets 100 MPG. Its possible if its in neutral going down a hill, but you will never get those results EVER in real life use. In my opinion 3ware is scared to show real life performance benchmarks because they struggle to beat even the most basic storage alternatives.
2) Support is not knowledgeable at most of these companies. I understand this. As someone who employs hundreds of support engineers I know first hand the challenges with training support representatives. However, when you sell a product as technical as raid controllers you better not have someone in india typing in questions into a knowledge base. Its extremely frustrating when you know far more about a product then the people you are calling for help.
3) Lack of published technical information – I understand the difference between marketing materials and technical materials, but SOMEWHERE you need to be able to find the beef! In so many cases there isn’t ANY information on the technical underpinnings that make these devices work. I have been “escalated” up the chain (ahem…) at 3ware several times only to confirm over and over that those I talk to have almost no idea what I am talking about. Please, just let me talk to the driver developer!!! I will pay! Just give me someone who REALLY knows. Short of me going through the driver code (And don’t think I haven’t done that!) the information I need simply isn’t there.
Here are a few of observations about 3ware that make me want to jab an ice pick in one of my eyes.
1) No support for Raid 1 “split seeks”, at least nothing that I can test and show. I still can’t find a single person at 3ware that even knows what split seeks are let alone if they support it and to what degree. Because we use primarily Raid 1 and some raid 1+0 arrays this is extremely important to us. Please don’t email me saying we should use raid 5 or 6. I know our workload perfectly and raid 5 or 6 is a nightmare that many other hosts don’t understand. Disk seeks are infinitely more import to minimize than maximizing space with raid 5 or 6 unless you really are doing primarily sequential I/O.
2) No support to use the onboard 256 meg or 512 meg cache (Depending on controller model) for anything other than raid 5 or raid 6 except for the write-thru journaling. This means that raid 1 or raid 0 is actually SLOWER using 3ware cards than most onboard motherboard controllers. Again, no one at 3ware knows anything about this. If you use write-back journaling then you lose data if you have to do a hard reset on the server. You can’t separate FUA requests (Disk array saying that a write is complete when its not even though a process is requesting confirmation that a real write has occurred) and journaling in “performance” mode on the controller. This means you choose fast and lose data occasionally or slow and skip the onboard cache completely. Get with the program and use the cache for something other than boosting sequential I/O benchmarks! Please support Raid 1 with your cache. Maybe even give us the option to choose read ahead for the cache or specify it exclusively for write cache, what a novel idea!
3) Driver updates that don’t work, and software updates that cause us to endlessly “verify” and “initialize” our arrays for no reason except that 3ware’s software is too inept to know the state of its arrays. Anyone who has used 3ware for any amount of time knows exactly what I am talking about here!
4) Rapid restore – We personally don’t use this option, but I tested it to see how it works. It is basically an option that allows you to restore an array faster by knowing only the parts of the array that are out of sync so you don’t have to go through the whole array to verify its integrity. The only problem is that it CRUSHES the array with writes to accomplish this feat. To illustrate this imagine writing down every footstep you have taken in your house for a week so that when it comes times to vacuum you only have to clean those spots where you walked. It takes 10 hours to write down all your steps so you only have to vacuum for 10 minutes, or you could just skip writing it all down and vacuum for 30 minutes instead. 3ware is CLEARLY testing this on a mostly idle disk array or they would have never released this beast into the public. When I spoke with 3ware they claimed it put very minimal load on the array. When I asked for the technical details of how it worked exactly they were of course without any concrete answers. I had to test it myself to see the impact.
Wow. I feel much better. I’m sorry for this long post, but trying to do quality hosting is sometimes an impossible task when you have to rely on so many outside vendors and service providers to come together to make your product work. Just as we rely on vendors to provide our service to you, you rely on us to power your websites and businesses. In the end it is 100% our responsibility to make it all work for you. I am so sorry that many of you have experienced unacceptable downtime from us because of these controllers. We have solved almost all of these issues now, but we know we have lost the confidence of many of you. I just wanted you to know the real reason behind our recent problems and that many sleepless nights were spent in the pursuit of a solid long lasting solution.
Matt Heaton / President Bluehost.com / Hostmonster.com