The pain of backups, the sweetness of speed!

I have written over and over that great things are born from trials and failure.  I have also written that very often when you need something done right you just do it yourself!  Although web hosting backups don’t count as one of life trials they have been a pretty big thorn in my side.

As most of you know we use Cpanel as the backend system for our control panel.  Is works reasonably well for most functions, but one area where it falls flat on its face is backups.  We have been using our own system for a long time and the performance is adequate but far from what we thought was ideal.  We tried several open source tools as well as a paid option from R1Soft (The WORST software I have ever had the great misfortune of trying).  Nothing was fast enough and here’s why.

Our average /home partition on one of our servers has anywhere between 3-6 million files on it.  Lets assume that only 10,000 files on that partition were modified in a 24 hour period.  You still have to parse/scan the directories that contain those 3-6 million files just to find the 10,000 files that were modified.  Without copying the files and doing just the stat (Or the scan) of all those files takes hours to complete.  However, what is FAR worse are the seeks that the block device (Hard drives) incur while they are doing the scan.  It slows the system way down just to find the files to backup.  Then when it finds files to change it still has to copy them.  This is “just the way it is” on every system I know including solaris, windows, linux.

One day I was thinking (In the shower of course – since 90% of all good ideas come to you in the shower) why not just have the linux kernel dump the name of any file that was created/modified (Any bytes written) at the time the file was modified and use that as a list of files to back up.  The kernel already has this information when a file is updated and just throws it away.  There is virtually no overhead to do this and it saves literally 90% of the time we would normally have spent on the system doing backups.  There are already similar hooks in the kernel to get this data through innovative techniques like Inotify, but Inotify is capable of a lot more than what we wanted and consequently MUCH slower.

The problem was that I couldn’t find a single program that implemented this idea or even mentioned this technique anywhere on the web.  In situations like this I turn to our favorite in house kernel hacker and demand magic.  In this case it took him one day to write a kernel patch that implemented this and stripped out files files that didn’t matter for backup purposes such as /proc or /dev and so forth.  So does it work?

YES!  Whats really neat is that it isn’t anything that is specific for Cpanel.  It works for any linux filesystem such as XFS, EXT3, EXT4, Reiser, JFS, etc.  I think that most admins don’t fully realize the amount of time that is wasted and I/O that is consumed just in the determination of what files need to be copied.  This new backup method is literally 10x faster than what we had before and puts far less load on the server in the process.

So what to do with it.  Well, after we clean up the kernel code a bit and make sure it is 100% rock solid I will post the patch free of charge here on my site.  The patch simply dumps a list of files to be backed up to any file you specify.  You can then do whatever you want from that point.  We will have a fully implemented Cpanel backup that will work perfectly with Cpanel and is completely compatible with their restore feature.  I have no pricing for it, but I will tell you this.  I will charge you only 25% of the lowest price that you are quoted from R1soft for their horrible software.  Meaning if you are paying $50 a client license I will charge you $12.50 .  Of course you are more than welcome to use the patch free of charge and implement your own system.  It is the fastest solution of any system I have ever tested (Including of course R1soft).

If you have any ideas that you think could make the product even faster I am open to any of your ideas.

Thanks,
Matt Heaton / President Bluehost.com

32 Responses to “The pain of backups, the sweetness of speed!”

  1. Sounds very nice especially for filesystems with lots of files and minimal change. Looking forward to the patch.

    PS. Agree with the shower comment too!

  2. There is actually one OS I know of that uses this technique. OS X :)

    It’s not quite as granular – it provides per-folder change notifications instead of the complete file list – but this is the technique that Time Machine uses to make backups fast enough that you never notice it is backing up every hour.

    http://en.wikipedia.org/wiki/FSEvents

  3. Ken Dreyer says:

    Thanks Matt for giving back to the community! It sounds like a stripped-down inotify… what are the differences?

  4. Is there a release date for this solution? We use R1soft across a few servers, and if this would help speed up the backup, we would surely invest in it.

  5. Mr. Heaton,

    Have you considered the following alternative to your “thousands” of hours in kernel hackery and IO confusion…

    Instead of selling unlimited storage for the cost of a cheeseburger, selling an actual product with legitimate value and using an actual commercial grade and tested disk to disk backup solution from Symantec or IBM?

    I suppose being the business owner that you are that at some point you put a value on your time vs. the cost of backup software which has been tested in an enterprise environment. Keep in mind that having several thousand servers that you built yourself doesn’t quantify as enterprise, when you are using desktop components and desktop grade hard drives.

    Just a thought…

  6. Sassan says:

    Good news for R1Soft, they’re waiting! LOL

  7. Dave says:

    I agree completely! 90% of all great ideas happen in the shower!!! I think it’s something to do with the isolation from electromagnetic radiation. :-) Go Matt!

  8. Bill Shirley says:

    As Andrew mentioned, OS X solved this problem in the same way. It provides the data for several other services they implement, most notably Time Machine.

    The Mac OS (Journaled) file system, basically keeps a queue of all the disk changes. The queue can me registered for, and a process can access it incrementally. I believe they even provide higher level APIs so that non system level access to this information can be provided.

    Solutions to software problems are often solved in the same manner by different people because once you frame the issue correctly, it’s obvious. (I wish the US Patent Office understood this.) It’s just a matter of finding the right shower to frame the problem for you.

  9. Jay R. Wren says:

    The kernel already has something called inotify which is an API that lets programs detect file changes. This is a cool use of iNotify. I may write my own “files changed” iNotify backup program which works similarly for desktop users like myself.

    Thanks for the great work Matt.

  10. […] president Matt Heaton posted to his blog on the 20th of January about the pain of backups. Included in this post is information regarding the solution they have developed and plan to […]

  11. steve says:

    Sounds a lot like FAM
    http://savannah.nongnu.org/projects/fam/

    Re: Jacob Forsyth
    I’ve run large prod networks using white-label and name brand hardware.
    You’ll typically get 5% increase in availability and maybe 10% increase in performance with the name brand hard for 5x-10x the price. With the use of clustered solutions and fail-over mechanisms, name brand hardware rarely if ever makes sense in the real world. Long and the short is it sounds like you have an axe to grind. Don’t knock Matt because he doesn’t subscribe to your outdated POV on celebrity hardware.

  12. R L Graham says:

    So if I understand your post correctly, you don’t care for R1Soft. :)

  13. Jayesh Ashar says:

    Matt,

    We have been suffering email inactivity for last 3 days due to your server migration. 20 employees are sitting idel for hours together and business is suffering in these crucial days every penny counts.

    It is thoughtful of Bluehost to do the migration during off peak hours of business of USA. But what about people like us based in India?

    I am sure you are capable of thinking of something else so that we do not suffer in this way.

    Awaiting your reply.
    Jayesh Ashar

  14. Dimitri says:

    I have an issue where I am installing a php script that needs phpshield and bluehost does not support it (I later found out). I think this is a simple matter for you to install on the server. I am trying to install http://www.phpmotion.com

    Phpshield.com is where the free loaders are. Can you please help and install these phpshield loaders?

    In other words, bluehost site cannot run protected php software.

    Please help and let me know. Thanks

  15. B.B says:

    Hi Matt!

    I’m new with Bluehost and so far, happy with the upload speed for files, and the speed the sites load in as well as the many cPanel features.
    However, one question came to mind when I was uploading a wordpress theme: is Bluehost planning any full zip uploads of WP files that easily adds into the right file folder? As it is now, each file has to be manually uploaded, one-by-one, and all folders and sub-folders manually created. It takes a lot of time. And, if you’re like me and want multiple themes to chose from, I’d be stuck loading these files 24/7/365…
    Would be great if zip files could be uploaded and auto-integrated.

    Greetings
    B.B

  16. Bill says:

    Can I make a suggestion? In account info in cPanel, there is only one field for email. In my current state, domain and email are down (on Bluehost). So when I contact support, they can only reply to an email address that won’t work because there is no secondary address. I have to keep remembering to remind them in my ticket to use a secondary (Gmail) address. I strikes me that when there is an email issue, there can be no communication without a secondary address, such as Gmail. A field in account info for a secondary address would make sense.

  17. […] It seems the CEO of Bluehost, Matt Heaton, found a method to make the backups more faster (and thus more often) by “dumping the name of any file that was created/modified (any bytes […]

  18. Ad says:

    I think I am abusing your blog for information I would like to have about your service. At this time I have my blog (and other sites) running on IX WebHosting now for years. They guarantee like your company very good up-time. But their average response time for an http request has gone up to something like 15 seconds now. It is quite frustrating. Why don’t you guarantee a maximal response time, so that I know it is safe for me to migrate.

  19. Jones says:

    Last post in January? Why aren’t you talking about your cooling outage which affected so many? How are you dealing with that?

  20. Alex says:

    That’s really ingenious. I’ll definitely be coming back to get that patch!

    Alex

  21. Adam Cox says:

    If you are performing backups across the wire (network), then something else to consider is network topology. An alternate NIC in each server being backed up configured to a private/maintenance subnet isolated from the operational network and used specifically for backups.

    I do like the way you’re thinking on these issues, Matt.

    ~ Adam

  22. Paul in Georgia USA says:

    Jayesh Ashar,

    Three days sounds excessive.
    However I have no idea who accesses your site as customers (not employees in India) but would you want it unavailable to U.S. clients and customers during US prime time?
    No matter what servers have to be turned down at times – unless yours was dedicated and not shared you are making a selfish call.

  23. Ashwin says:

    Jayesh,

    I think that in most cases mail is just a by-product of hosting. Its just the fact that our DNS /MX points to Bluehost that they serve our email.

    But in my opinion, we should leave it to the experts to do what they do best. Let Bluehost do your hosting, and let Google apps handle your domain email!

    I dont see any hosting provider provide an excellent mailing solution (I dont blame them, it’s just too difficult to provide enterprise class email when you’re just buying them 1 lunch a month!!), so the next best way it to go to Google, who’ll do it for free!

  24. Tane says:

    Matt,

    can we expect this patch any time soon then? :-)

    cheers

  25. First, I definitely agree with the shower comment! I solve 99.9% of my problems that way!

    I maintain my own server and run into the same problem with personal backups. As a system admin, I would have never thought to look at the kernel to speed up the process. I currently just overwrite the entire backup and take all files because it is usually quicker than a seek then update.

    I look forward to seeing this patch released :D

  26. […] I discovered this blog on Matt Heaton’s blog (http://www.mattheaton.com/?p=179) about CPanel […]

  27. InMage says:

    The problem with this approach is on large systems the log itself is a huge performance impact and can in some cases be never caught up on the backup side.

    check out dr-scout (www.inmage.com), it backs it up at the volume level, so a million files or one large file does not matter. the load on the protected server is more or less eliminated… not the r1soft nightmare that you write about.

  28. Tim Jones says:

    I won’t revisit as others have mentioned Apple’s fseventsd – a really cool solution, but kernel modifications are not the only thing that can be done.

    A backup tool that performs a proper treewalk does not stat or read EVERY file on a filesystem. Performed properly, the treewalk won’t even proceed into branches (directories) if the mtime and/or ctime of the directory is older than the date of the comparison. Therefore, the system is not pounded as not every directory is touched beyond the highest level directory that has modified contents – inodes are good that way.

    In fact, I just ran a test on our primary internal RAID array (stock Linux 2.6.29 and ext3). The filesystem is 2.6TB RAID 5, has a filecount of over 2.4million files, had 5,234 files changed since Last Wednesday. The scan for the modified files starting from the /home mountpoint took 18% of 1 CPU and a total run time of 1m 18s. For project, I changed the path to the root which includes the system, and 3 other mounted volumes that should have contained only modified log files (100 or so). I forced a cache flush and the result in that case was 1m 24s and the same 18% load average. The system hosting the volumes is a 2 core AMD 64 X2 4800+ (1GHz) with 4GB RAM – not a stellar performer by any stretch of the imagination.

    It seems to me that if your filesystem scans are taking that much of a hit on your system, you’re using the wrong backup product.

  29. It is also easy to backup and transfer all your websites from one server to another server if you have cPanel installed”~-

  30. I have seen Parallel control panel doing such magic and i think it’s wonderful

  31. C-Panel is also doing a great job as regards transfer and Parallel is a good utility for transfers across multi platform and data backups

Leave a Reply