Archive for January, 2010

Bad Apple or Great Kid?

Sunday, January 31st, 2010

When I was young I was extremely hyperactive. It got so bad at one point that in the 3rd grade I was allowed to just “leave” class whenever I wanted to have my own personal recess. The school did this because my poor teacher was so distraught with my behavior that she literally couldn’t handle me and so I was allowed to roam the playground until my “energy ran out” – which of course never happened.

Looking back, I feel really bad for what I put all my teachers through. I really was a wild kid :)

I remember in the first grade working through all the first grade and second grade math books by the end of September. They wouldn’t let me do the 3rd grade math books because they didn’t want to me get ahead (I always thought that was ridiculous by the way). After that I started getting “S”s on most of my report cards. S=satisfactory. My Mom wanted “O”s for ‘outstanding’. Later, I started getting “N”s on my report cards. N=Needs improvement. At this point my Mom started getting worried. She thought that because I was misbehaving so much that I wasn’t learning the material, but that wasn’t the case.

The problem wasn’t that I didn’t know the material, the problem was that once I learned something (Or thought I did) then I HAD to move on to something else. When I say that I “HAD” to move on, its the truth. I literally couldn’t bring myself to do “busy work” for a concept that I already understood just to satisfy the teacher. Often times homework didn’t get done because I KNEW that I understood the concept. It was a complete and utter waste of time in my mind, and I had new exciting things that I was busy working on. I always craved doing something new.

High school was the same. I remember getting a D+ in chemistry one semester (Worst grade in highschool), but when it came time to take the ACT for college entrance I scored a 35 (Near perfect score) on the science portion, which happened to be Chemistry that year. Things just moved a little too slow in school for me, and I am grateful for it now because it gave me a lot of free time to learn about computer hardware and software development.

One of the things I love so much about Bluehost and Hostmonster is that I get to pick and choose new things that interest me, that are challenging, and that will benefit our customer base. In other words, I have an environment where I can succeed.

I could just have easily been written off as one of those goof off kids with poor grades, or presented with serious challenges and given the freedom to experiment and learn and do things that others haven’t yet tried. I’m so happy that I was given a chance to show what I could do later in life.

Everyone in this world has something to offer. The sooner you find out what that is the sooner you will find happiness. Don’t let other people tell you what will make you happy. Instead, look from within and see what it is that drives you, and what you need and then go in that direction.

Your happiness doesn’t require the understanding and comprehension of those around you, it only requires understanding by yourself. Find out what that is and then happiness will be yours.

Matt Heaton / Bluehost.com

Bluehost’s “Secret Numbers”

Wednesday, January 27th, 2010

January 2010 has seen some good growth for our hosting platform. I am usually pretty secretive about our company “numbers”, but have decided to spill the beans tonight on my blog. Below are some interesting stats from our various hosting brands.

Total Domains Hosted : 1.9+ million domains
Total Paying Hosting Customers: More than 525,000
Total Servers: 850+ (ALWAYS rotating out older servers)
Total Sales/Billing/Support Requests Per Day: Approximately 5,000
Number of new customers (not domains) added each day (Mon-Fri): 800+
Number of new customers (not domains) added each day (Sat, Sun): 500+
Number of new domains added each month: 50,000 – 70,000
Total Bandwidth Capacity: 20 Gigabits/Second (100% ours, not shared in ANY way)
Average Hold Time For Support: 19 seconds
Number of Employees: 240+
Registrar For Domains: Fastdomain Inc (Sister company that “sells” domains to Bluehost/Hostmonster)
Outsourced services: NONE!!!!!!!
Revenue: _____ (Some things really do need to be kept private)
Profit: _____ (Some things really do need to be kept private)

Bluehost/Hostmonster/Fastdomain have been wildly successful. I’m so grateful to have been part of this incredible venture. There was and is an ENORMOUS amount of effort put into making our products the best that we know how to make it. Add to that a lot of luck and we get Bluehost and Hostmonster.

Thank you so much to all our loyal customers that tell all your friends to sign up! The vast majority of all our sales come from non affiliate related word of mouth recommendations. That doesn’t happen unless our customers think we are doing a pretty good job. We promise to try our hardest to improve the things that are “good” that should be “great”, and to add the features that you need that no other company will bother to add. That is our promise to you!

Thanks again.

Matt Heaton / Bluehost.com

Linux CPU Scheduler (The biggest problem you never knew you had!)

Saturday, January 16th, 2010

This is perhaps the least sexy topic I’ve ever written about :) The linux cpu scheduler is an extremely important part of how linux works. The CFS scheduler (Completely fair scheduler) has been a part of linux for a couple of years. The purpose of the scheduler is to look at tasks (processes and threads) and assign them a processor or cpu core to run on and to make sure that all the processes that need run time get an equal and fair share of processing time. It is also responsible for context switching (migrating tasks from one cpu/core to another or switching out processes that don’t need anymore run time). This helps to balance processes and make better use of cpu cache by being “smart” about where to put queued and running processes.

It all sounds simple enough, but there are HUGE problems with the design of CFS in my opinion. I’m getting in dangerous territory here because I’m about to tear apart something that was designed by people that are much smarter than myself. However, I have something that most kernel developers don’t have access to – a huge and unbelievably busy network. Our network receives more than a trillion (Yes with a T) hits every quarter. We receive more than 100 million email every day. We send out more than 25 million email each day. We now have more than 5 petabytes of storage. In short, I have one of the best testbeds on the planet for finding deficiencies in an operating system.

Enough background, lets get to why I think CFS is “broken”. As the number of processes increases CFS is disproportionally slower and slower until almost no work (CPU processing) gets done. There are many tunables to modify how CFS behaves but the premise is the same. CFS is based on the incorrect (In my opinion) basis that all processes are always “equal”. I can easily create enough processes on a production server that CFS will completely consume almost all the cpu cycles just trying to schedule the processes to run without giving the processes almost any time to actually run.

Think of it like this – Lets assume that for every process to run it takes .1% of the cpu to “schedule” a process to run, and then it takes X % of cpu to run the program. But what if you have 900 processes running and each one takes .1% of the cpu for scheduling. Now you only have 10% of the cpu remaining in which to run your software. In reality I think its much worse than this example. After about 1500 concurrent processes CFS completely starts to fall apart on our servers.

The worst part about this is that the only way you can really tell this is happening is to measure the process quantum (The time slice that userspace programs get of a cpu/core). How many of you know how to measure the average process quantum of the scheduler – That’s what I thought :) If you add up all the “quantum times” during a 1 second period and look at the difference you will see how much CPU the kernel is taking to service those requests. On a desktop system I get about 95% of a CPU for running my software. On our busiest servers I get about 70% of our available CPU time for actually running our software. The rest is eaten up by the inefficient scheduler. If you feel compelled to evaluate the process quantum time you can enable sched_debug in the kernel and check out its output. It’s actually pretty good data for those nerdy enough to read it.

Its been near impossible to prove my calculations over the last several months, but after many long nights I now feel very comfortable in saying that CFS truly is a broken design. It may be a good design for a desktop, and admittedly the kernel guys have made low latency desktops a priority but still… You do have to have some upper bound limit on how many processes can be running and how many new processes can be started over a given period of time, but this limit should be MUCH higher than 1500-2000. I would say it needs to be somewhere in the 10,000 range to really be effective with hardware that will be coming out in the next 6-18 months. If linux wants to scale efficiently to 16,32,64 cores then the scheduler needs some serious work.

How do we fix it? Well, we actually have a “process start throttler” kernel patch that evens out the start times of processes that gives predictable behavior to the scheduler, but it doesn’t solve the issue of the scheduler simply not scaling. It actually gives us a pretty substantial gain in speed and more importantly it stops a single user that launches a ton of processes at once from impacting the speed and stability of everyone else on the system. This is pretty complex to explain, but its actually being tested on live servers starting today, but that is a blog entry for another day.

Thanks,
Matt Heaton