I have written over and over that great things are born from trials and failure. I have also written that very often when you need something done right you just do it yourself! Although web hosting backups don’t count as one of life trials they have been a pretty big thorn in my side.
As most of you know we use Cpanel as the backend system for our control panel. Is works reasonably well for most functions, but one area where it falls flat on its face is backups. We have been using our own system for a long time and the performance is adequate but far from what we thought was ideal. We tried several open source tools as well as a paid option from R1Soft (The WORST software I have ever had the great misfortune of trying). Nothing was fast enough and here’s why.
Our average /home partition on one of our servers has anywhere between 3-6 million files on it. Lets assume that only 10,000 files on that partition were modified in a 24 hour period. You still have to parse/scan the directories that contain those 3-6 million files just to find the 10,000 files that were modified. Without copying the files and doing just the stat (Or the scan) of all those files takes hours to complete. However, what is FAR worse are the seeks that the block device (Hard drives) incur while they are doing the scan. It slows the system way down just to find the files to backup. Then when it finds files to change it still has to copy them. This is “just the way it is” on every system I know including solaris, windows, linux.
One day I was thinking (In the shower of course – since 90% of all good ideas come to you in the shower) why not just have the linux kernel dump the name of any file that was created/modified (Any bytes written) at the time the file was modified and use that as a list of files to back up. The kernel already has this information when a file is updated and just throws it away. There is virtually no overhead to do this and it saves literally 90% of the time we would normally have spent on the system doing backups. There are already similar hooks in the kernel to get this data through innovative techniques like Inotify, but Inotify is capable of a lot more than what we wanted and consequently MUCH slower.
The problem was that I couldn’t find a single program that implemented this idea or even mentioned this technique anywhere on the web. In situations like this I turn to our favorite in house kernel hacker and demand magic. In this case it took him one day to write a kernel patch that implemented this and stripped out files files that didn’t matter for backup purposes such as /proc or /dev and so forth. So does it work?
YES! Whats really neat is that it isn’t anything that is specific for Cpanel. It works for any linux filesystem such as XFS, EXT3, EXT4, Reiser, JFS, etc. I think that most admins don’t fully realize the amount of time that is wasted and I/O that is consumed just in the determination of what files need to be copied. This new backup method is literally 10x faster than what we had before and puts far less load on the server in the process.
So what to do with it. Well, after we clean up the kernel code a bit and make sure it is 100% rock solid I will post the patch free of charge here on my site. The patch simply dumps a list of files to be backed up to any file you specify. You can then do whatever you want from that point. We will have a fully implemented Cpanel backup that will work perfectly with Cpanel and is completely compatible with their restore feature. I have no pricing for it, but I will tell you this. I will charge you only 25% of the lowest price that you are quoted from R1soft for their horrible software. Meaning if you are paying $50 a client license I will charge you $12.50 . Of course you are more than welcome to use the patch free of charge and implement your own system. It is the fastest solution of any system I have ever tested (Including of course R1soft).
If you have any ideas that you think could make the product even faster I am open to any of your ideas.
Matt Heaton / President Bluehost.com