R1Soft – A Backup Nightmare

R1Soft is a backup “solution” aimed at companies that want to backup only the changes that have been made since the last time a backup was created. Its a good idea, but it went horribly wrong. I am writing about it today because it was such a poor experience working with them and we had so many server problems and REPEATED DATA LOSS that I felt compelled to warn other hosting companies and customers about this extremely poor product.

R1soft caused enormous I/O load that literally killed our servers off and on multiple occasions simply would not restore our data and caused 100% data loss on the servers we had “backed up”.

The rest of this blog entry details technical reasons why R1soft is broken and inept. If you aren’t interested in those details you can skip the rest of this entry.

MAJOR TECHNICAL DEFICIENCIES AND OPERATIONAL ISSUES WITH R1SOFT -

1) Buagent (Their software that records what has changed) is a HUGE i/o pig. It constantly blocks on I/O when there are changes causing every other drive or array in our system to lag while it catches up. R1soft told me it “works great” and to send them over details and they would “Look into it”.
2) Buserver (Their software that writes what has changed) is INCREDIBLY slow in every way that it is virtually unusable when you have a lot of data to back up with tens of thousands block changes every day per server. Clearly it hasn’t been tested well under sustained and real heavy load – at least in my opinion. There are multiple kernel and userspace methods to speed this up, but again it falls flat on its face when trying to keep up with our load.
3) Restores are buggy and VERY unreliable. We have attempted more than 20 restores (whole partitions) and had more than 30% be a complete failure, meaning TOTAL data loss. Doing user level restores were the same story – incomplete backups, unreliable and SLOW restores.
4) R1soft support is lackluster to say the least. When I complained they said, “You only opened 17 tickets in the last 60 days.” 17 tickets? Often when we had issues doing a restore or other problems we would call and get no one to answer the phone. Its comforting to know that when you can’t do a restore for 500 clients that we can possibly talk to someone on Monday who will blow us off and tell us that the newest version of the software MIGHT fix our issue. In the meantime they apologized for losing all our data over and over again. Wonderful.
5) The software is based on filesystem block changes. The problem is that the kernel and ext3 and reiser are changing all the time. As we made updates it constantly broke R1soft. When they would finally fix the problem it usually required us to reseed the entire backup again losing everything that we had previously backed up and putting enormous load on the server again while it started over from scratch

In all my years of hosting I have never had such a bad experience with a “partner” company as R1soft. I apologize to my customers for subjecting them to this software abomination and to my staff for making them deal with it for 2 months before we finally extracted the final traces of this software from our system.

Matt Heaton / President Bluehost.com / Hostmonster.com

29 Responses to “R1Soft – A Backup Nightmare”

  1. Darren says:

    So what are you using to replace it?

  2. R1soft user says:

    I have also been using R1soft for about the last year to backup some of our servers. Initially I was really impressed with the software but once I started working with it more and dealing with R1soft the company, problems unfolded. Mainly:

    – Over the last year there have been lots of bugs and issues requiring frequent updates. Its cumbersome to constantly update agents on all of the servers being backed up and keep track of what update needs to be applied to fix what bug

    – Lack of support. Since all of the code is closed-source, customers HAVE to rely on the company’s support department to troubleshoot and fix problems. I think R1soft generally has good people working there but their support department must be heavily overworked as it takes days to get a response usually and weeks to get any sort of resolution.

    – Crippled software. I replaced some hardware in our backup server on a Friday night. Lo and behold, buserver now refuses to run because my license was tied to the previous hardware combination. I was stuck waiting until Monday to get a license reset. Imagine – ALL WEEKEND with no ability to MAKE or RESTORE backups after a hardware component failure! It scares me that if R1soft were to close its doors, get shutdown temporarily by a natural disaster, etc, all of their customers can really be screwed with unusable software.

    – Broken promises and expectations. Frequently they will say a big new version is coming to offer XYZ new features and fix ABC problems. Everyone waits for it. It doesn’t come. Until long after expected.

    example: on June 2 in the r1soft forums, David Wartell announced they would get CDP Agent 2.0 for Windows released next week as he was: “giving my dev team until the end of this week and I’m not going to hold up the windows 2.0 launch for it any longer.” You can still see this on the r1soft forum. As of today, August 1 2008, CDP Agent 2.0 for Windows is still not available.

    Also they made a big fuss about launching the long-waited CDP Server 2.0. Once it was released, they marked it as ‘pre-release’ status instead of stable on their website, while encouraging everyone to use it.

    Overall I think R1soft has some great ideas and technology but they have poorly implemented them so far. It is disappointing to me because there aren’t any other good backup solutions for Linux servers that are 1) easy to setup, 2) require minimal extra load on the servers being backed up and 3) support bare-metal-restore out of the box.

  3. Kok Hai Tan says:

    God will bless your honesty and integrity in business. Especially you have the gut to admit your mistakes!

    Go BlueHost!

  4. Hello,

    Firstly, let me congratulate you for great success. We are very satisfied bluehost customers (so much, I wish I could white label the service in spanish for our customers in spain).

    After reading this post in your blog, I couldnt help but offer you our services. We have been working for some time now on backup services. In fact, our online backup service (www.zendalbackup.com) will be coming up very soon. I would be interested in knowing a little more about your requirements (eg. how much data we expect to move daily) in order to be sure that we can handle your data volume.

    We are a startup, but we pride ourselved in our customer service, and would value the posibility of working with you.

    Thanks and congratulations once again,

    Pritesh Hiralal

  5. Wes says:

    Good you have warned others. The last thing you want is a buggy backup software.

  6. Bob Barr says:

    Wes: “The last thing you want is a buggy backup software.”

    Taken a step farther, the last thing a company may ever have is buggy backup software.

  7. Lane says:

    Interesting read, thank you for sharing your story with us.

    On a completely unrelated note I think your company should take notice that its starting to build a rather poor reputation with how your customers sites go down when they get popular on social networking sites.

    This url directly links to one such comment -

    http://digg.com/comedy/SohHow_many_guys_have_you_slept_with?t=17587417#c17587417

  8. josh says:

    just wanted to let you know i had an excellent customer service experience with Roger Brown. couldn’t find a corporate number to call and let anyone know. but he was working the live chat help center and he helped me resolve my issue quickly and friendly. you’re the only corporate contact i could find with a few minutes of google searching. just wanted to say thanks considering it’s not always the norm to get quick, excellent service when it comes to hosting.

  9. Patrick Donnelly says:

    Hello Matt,
    Thanks so much for your candor. A rare thing these days.
    Sorry you are fighting stuff that end users sometimes get.
    Not that this will be applicable to you but the folks at Secondlife.com have massive amounts of data moving all the time. There is a perfect example of systems being wrung out to the max. They might have interesting thoughts on server backup strategies.
    Just a thought…

    In the meantime, thanks for stellar service!

    Patrick

  10. Anthony says:

    This is why I host with BlueHost.

  11. Kapi says:

    Hello,

    I have been very happy with Bluehost and have two sites hosted there. I have been emphatically recommending it to my peers and clients. But i wonder if I jinxed it. I set up a new site a few days ago and purchased 3 yrs worth of hosting (such is/was my confidence). While building the site, I had terrible timeouts and freezes on cPanel and my website just would not load for my clients.

    I chatted with a tech “Jake” who seemed a little irritated that I was whining about QoS and said (after a 7 minute wait) “Our system is under heavy load right now, our admins are working on it”….followed by the line about looking at the knowledge base for common issues.

    I hope these things get ironed out soon.

  12. Alex says:

    Was this software ever fully tested? Did your staff not test the restore functions? Where any of these problems present when the testing was done?

  13. Sassan says:

    Long time, no new post!!! we’re waiting …

  14. Matt Place says:

    I was in the middle of researching hosting options and came across your blog (nice to see a CEO blog, by the way) and I couldn’t help but sympathize with your backup service provider experiences.

    Please do not take this as a sham-less and opportunistic plug, but I (as a customer/end-user) would like to offer a suggestion for a potential replacement for your current backup service provider. Up until leaving the company I used work for (6 months ago) I had established a great relationship with a backup service provider called Netmass.

    At the time I worked for a large law firm and we were in desperate need of replacing our aging (and failure prone) tape backup system. After some exhaustive research I recommend that we pursue an online backup provider, namely Netmass.

    For sake of time I’ll end this shortly. Simply put, Netmass ROCKS! Their service was fast and reliable, and their customer service was uncommonly great.
    From a technical standpoint I whole heatedly believe in their service (it’s built upon the Asigra Televaluting product). I worked closely with Netmass for over 2 years–up until I left the law firm.

    Good luck with your backup situation. I hope this helps in some way.

    Side note: Give them a call and ask for Steve Perkins, and tell him Matt Place from HGS referred you…he’s the man!

  15. Bikas says:

    Dear Matt,

    Congratulations from a proud and satisfied bluehost customer from Mumbai (Bombay) India.

    I though I’ll use this space to draw your attention to a small but extremely important issue.

    Despite best of our efforts, few times we face CPU usage quota exceeded issue…

    And the current default message is like a school headmaster’s rebuke to a student for poor grades. However, the only issue is that this message is first read by my readers not me….

    I checked with support, it can’t be edited.

    Instead of “account suspended” “CPU quota exceeded” can’t there be a user (reader) friendly message:

    something like–

    “Sorry, we just ran out of resources. Check back after 10 minutes and you’ll find us here! See you.”

    And current rebuke could be sent to webmasters in email, so that we could start working on our “poor scripts” to make them efficient.

    I still think the best way would be to give us an option to create custom error pages for such scenarios.

    Hope you find my suggestion useful.

    best regards,
    bikas

  16. Zach says:

    What this blog post says to me is the you don’t fully regression test your solutions before putting them into use. We looked at the same solution when we were evaluating the software and came to the same conclusions long before it ever touched a production system. I have experienced nothing but piss poor up times since moving to blue host. I log on to your chat(which now seems to be one off the main page) at least once a day to report a server outage. I always get some BS excuse why its down. Looking at this I have the odd feeling my outages can be chalked up to lack of experience and improper testing of systems. There are many good papers out there on n-tier environments, and how to properly use them the test.

  17. I recently ran across your complaints regarding R1Soft’s offering. We’ve been using it without issue but perhaps our usage environment is significantly different.

    What do you mean that the buserver is slow? How large are the incremental backups and what type of disk subsystem are you writing them to? I am very curious about the details of environment in which these problems surfaced.

    Slow is rather subjective. What metric were you using to determine that the backups were slow. I’ve not measured our systems since they work fine, but I may get someone here to do some testing.

    We have some smaller R1Soft data vaults (2-3TB in size) and do not have disk IO issues on them, but we are not pushing the hardware either. Just running a few backups whose incrementals are small in size.

    So far, restores have worked well for us, including some bare-metal restores. We can also restore a physical host into a vmware system which is pretty nice as well.

    If you could publish some specifics, that would be very helpful. Would certainly provide more concrete guidance to those looking to scale this solution.

  18. Nathan S./'Heed'(Pants! Now!, etc) says:

    Hrm, this probably doesn’t apply to a high availability environment, but I implement my own backup solution using hard links and rsync. Possibly the performance is too fugly for your needs, but it’s something to consider.

    Hi Matt :)

  19. G Sicard says:

    What a fantastic Idea. I personally hate that message. I cannot even backup my database using cpanel now I get that message.

  20. Andrew says:

    Matt,

    I second or third the comments on the CPU exceeded message. Between all the clients I manage I probably have about 15 accounts with bluehost.

    One of my bigger sites, got that message all the time, if it had been friendlier, or if we could have customized it, I probably would have kept that site with you and not moved it to Media Temple. Other wise, all my other clients love your hosting, we have almost no downtown and the one click installers are great,

    I wish there was a way to manage all my clients from one interface, but still have you bill them directly, thanks,

    Andrew

  21. John says:

    We were one of the first companies to patent synthetic backup consolidation technology and have state of the art solutions that work for national labs, a super computing center, and many of the largest universities. We would be happy to work with you on a solution if you are interested. Please see our site for contact information.

  22. sense says:

    I’m not sure why you were having problems from slowness.. I mean, r1soft does have its problems..

    But, I’m doing a restore now (actually, a move to a new server).. I’m getting about 4-8mbyte/second on a 100mbit port of writes to the new drive.. I’m restoring from chicago to a server in dallas..

    What sort of machine was hosting your r1soft cdp server? I would suggest at least a dual quad core with enough drives in raid 10 to provide the server i/o bandwidth you’ll need. Plus, if you’re compressing and encrypting them.. Of course it will be slow you have to decompress all that crap on the fly..

  23. Glenn Kelley says:

    We are so pleased with the R1Soft solution.
    It works as advertised – each and every time.

    When we ran the server on the same host as it was backing up – it was horrible…
    Their staff however was a significant help in finding the issues and supporting us in our time of trouble.

    I guess the question I would have would be how did your credit card company react to the situation (if you used a C/C ?)

    I have had some issues with other software applications and AMEX has been a great solution in those cases…

    Honestly – I found R1Soft to be an excellent company who focus’ on its clients issues… With the acquisition from BB1 Technologies I have found their support to be better than ever…

    BB1 is the same group behind the ever so popular Solar Winds application- just wondering.

    Have you had the chance to test them out since?

  24. KR says:

    Our company also uses R1Soft for over a year now on over 75 servers and have not had 1 complaint yet.

    Not even sure why this blog is up here, seems a bit destructive. My suggestion is first look at the source of the problem by looking within your company first. Second is worry about defusing all of the bad blogs and posts about Bluehost before you publicly blast other companies. Seems like you’d want to relieve some of the negative attention that Bluehost already gets.

    But hey, what do I know?

  25. Dan says:

    I do not use bluehost atm but I found there customer support to be top-notch. It is nice to see the post and comments here – maybe I should switch back ..hmmm

  26. Janet says:

    R1 seems to be offering services that are not unique in that you can get back-up services that they offer from your main provider -(using different servers)

  27. Kevin says:

    Yup my experiences with r1soft so far I am sorry for not be leaving you!

  28. Lachlan Mulcahy says:

    I have to agree here.

    R1Soft might work great for systems that are not particularly IO hungry in the first place and also data sets that do not experience a high volume of changes (generally informed by being not particularly IO hungry anyway :) ), but for systems that require any significant degree of IO, you should just look for a different backup solution.

    The company I work at has R1Soft deployed currently for backing up MySQL DB servers. Frequently buagent starves the system for IO and in some cases so badly that the kernel complains of timeouts attempting to perform fsync’s for syslog and other core tasks.

    Frequently buagent is the cause of performance issues and even has taken down databases (usually causing the system to slow to such a crawl that InnoDB encounters a semaphore wait that is longer than 600 seconds, in which case it decides to assert and bounce itself under the belief it managed to get itself “stuck”).

    Aside from buagent being a complete IO pig as you say, the buserver software seems horribly in-optimal.

    Incremental backups can take DAYS to complete — this is on systems with the same storage arrays that we’d use for production database performance – 22x10K SAS drives. If we actually ever wanted to restore one of the backup snapshots it takes DAYS, and we have to cancel all the other backup tasks just to free up enough resources for that.

    Their support is mediocre — usually you have to talk to somebody who is just a documentation regurgitation drone. If you stomp your feet enough thought you can get a developer.

    Overall if you dare to consider using this software, make sure you load test it as much as possible for the kind of uses you have planned for it.

    My personal suggestion is to look for something else.

    We’re now working on moving away from this horrible software.

  29. VW says:

    Ah Matt you are so lucky, you only got to suffer with this CDP 2. Thanks God you did not tested CDP 3.

    This post is about CDP 2. If you had happen to test, try and use CDP 3 you will commit suicide. CDP 2 (the one you are talking here) was a blessing compared to CDP 3. If you think CDP 2, then please have a change to destroy your servers with CDP 3. Not only is this new version inferior, it can only handle tops 1 backup in the server. You said CDP 2 was a pig on I/O, at least with CDP 2 you could do 15 minutes incremental.

    Forget that with CDP 3. The same backusp which took 10 minutes in CDP now doubled with CDP 3 taking twice as much. This means shot backups like every hours are impossible now, unless you have one single to backup into one single backup server. This new version cannot hande data merging of 2 backups at the same time without going into speeds like 100 Kilobits for the drives. The fragmentation is 99% in 8 hours or running 8 backups on a new drive.

    R1Soft did all the possible best thing to complete destroy their product. CDP 3 is more expensive, has less features (like no cPanel integration) and is worst in performance. No wonder they are losing customers to better and cheaper competitors.

Leave a Reply