With all the news about Facebook lately, it's interesting to look at what they've become. Some pundits have labeled them the bellwether for growth in the IT space. With the release of their "Open Compute Project" data center, there's talk of them ushering in a "new era" of data center design.

Don't get me wrong, you can't get to the size that Facebook is without being a difference-maker. The bigger you are, the more water you displace, and Facebook definitely gets everyone's attention when they start moving around in the pool. That being said, I don't see them as particularly visionary outside of the confines of their application design, and I'm not sure that I'd call them a bellwether for anything.

Look, marketing angle aside (and I personally think the "Open Compute Project" is brilliant) Facebook isn't doing anything that Google and others haven't already done. When you have a massively scale-out application (and almost nothing else) it becomes both wise and easy to create thousands of custom-built servers to support it. When you also have the ability to deliver that application on a completely custom code-base, there are other ways you can leverage the "cheapest-of-breed" model. When you can plan for the loss of servers on a massive scale (think ~50% at one time) you really don't care about the quality or reliability of those servers anymore, right? In that respect Facebook and Google are doing almost the identical thing: racks and racks of throw-away hardware in a data center that provides the absolute minimum protection at the lowest cost per node possible. Of course it's efficient. It's also a model that doesn't matter to the other 99% of the industry.

Ask the largest owners/providers of data center space (large enterprise, large co-lo, telco and SP companies) and all of them will shrug their shoulders. Outside air? Higher ambient temps? Physically isolated cold/hot aisles? Forced plenum? Efficient server power supplies? Modular design? Come on, all of these have been in general use for years. I worked for a regional co-lo company for six years, and almost every one of these principals were in use there since 2004 or so. None of this is revolutionary, and I don't think it really matters other than as the final proof point for these technologies; if Facebook is doing it, it must be OK…

On the server side, the idea of packing a data center with 10,000 servers that are all identical is a pipe dream for "normal" data center operators. With multi-tenant facilities, the customer brings their own servers. Even in enterprise data centers there are multiple workload profiles that have to be accommodated, and even the magic of virtualization doesn't round out all the rough edges. Different processor vendors and steppings have a material impact on the ability of the customer to use all of the features of a standard virtualization stack, and so unless you are going to apply CPU masking all over the place, or unless you are going to wholesale replace every server at once to keep the CPU models similar enough to be useful, you are going to have an issue. When you are running a single code-base that you can purposefully abstract away from the processor speed/type (think something like SETI@Home, where the "work product" is just a completed job, where the time to complete is tied to the hardware doing the work) this kind of model just doesn't make sense. Mark Thiele has a good discussion of this on his blog here, and he estimates that just 25% of the Facebook data center design will be relevant to the greater market. I think that number might be high.

So it's not IT. At least not the IT that we've known traditionally. And it's not a general-purpose data center. Well then, what is it? Is it "The Cloud" we've all heard so much about? Well…maybe.

In my opinion, it's cloud in the most basic sense, born out of the legacy "software-as-a-service" model. In this case you have a single application that is scaled out to an unbelievable degree. It's so big it's almost hard to wrap your mind around, but Facebook isn't the only one inhabiting that universe. Google, Amazon, eBay, Twitter, ShutterFly and others are in that same space, where they have the need to run a small number of applications at incredible scale, at variable/seasonal loads, for millions of end-users who could be anywhere on the planet. In the purest sense, they are an "application cloud", and their entire business model supports this to some degree. The challenge is that while this is normal and understandable to these kinds of businesses, it's not something that the majority of the world's IT can (or should) emulate. As much as server/storage/virtualization vendors would like to have us believe, the world does not revolve around the infrastructure; the users, and their applications, always come first. If you ever doubt this, just look at Oracle. When you have control of all the applications that the end-users need, you can treat them like crap and have them come back for more. In the case of Facebook and Google, the data centers, every server it holds and even the geographic location of the facility have been focused to support the one application stack they provide to their end-users, which isn't so different from how an enterprise uses specific types of servers and environmental design to cover the requirements from each of the application types they support. The difference is in the number of applications that are required, and the scale that they are used.

Does that make it "cloud" in the sense that we are talking about today, with our "private cloud" and "hybrid cloud" nomenclature? I don't think so. Calling Facebook's "Open Compute Project" a private cloud is mis-counting the trees due to the perception of the size of the forest. There are only a couple trees, and even though they are huge, a private cloud typically is designed to handle a different workload profile. I don't want to take away from what Facebook has done; as a guy who lived and breathed data centers for a long time I'm appreciative of the efforts they and Google have put into their environments. I'm just not sold that it's revolutionary, a bellwether for where general IT is heading or a good example of a private cloud.

Thoughts, comments and (polite) objections are always welcome in the comments. Please disclose any vendor affiliation to help keep the conversation on track.

(BTW: for those who said it was impossible, there's a 1100+ word blog post that doesn't mention my employer or our flagship product even once. :-)

14,960 total views, no views today

 

2 Responses to Facebook is not IT, But is it Cloud?

  1. Jh says:

    It’s interesting that you point fingers at companies with massive scale-out applications as being willing to leverage the “cheapest-of-the-breed,” and insinuate these systems are neither new or reliable. The Facebook engineering team managed some ingenious inventions [plenty of info on this at http://www.opencompute.org], while simultaneously reducing cost, improving efficiency and improving reliability. Cost and efficiency are direct measurements. Long-term reliability remains uncertain, given that we’ve only operated this datacenter design for a few months. However, if you read through the materials we released, you’ll note an extra 9 in the datacenter reliability. We’ve also observed a far lower rate of infant mortality and hardware fallout from the Open Compute servers than from industry standard devices. Once we have operated the datacenter for a longer period of time, we’ll share further reliability statistics.
    Enterprise IT should absolutely participate in the Open Compute Project. Who doesn’t want to use cheaper, more reliable and more efficient infrastructure? You’re absolutely correct that companies like Facebook build applications to scale horizontally and have deep expertise in managing reliability across a massive fleet of servers. Being a former enterprise IT guy, I admit this doesn’t come as easily to a typical enterprise. However, thanks in large part to the ecosystem built up around hypervisors (cloud management tools, virtualization consoles, packaging systems, etc.) an enterprise can leverage a set of open-source or commercial tools to bridge application development methodologies. Why would enterprises have to rely on expensive, proprietary hardware when Internet companies build applications which scale 100-1000X using commodity hardware in conjunction with a robust software infrastructure layer?

  2. Jh,
    Thanks for the comment. Yes, I’d agree that I don’t think the model of custom-building bare-bones servers specifically dedicated to a known workload is either new or reliable, depending on your viewpoint. That being said, I don’t think that it’s a bad practice if you can get away with it. If you spend half as much per server, all you need is to be more than 50% as reliable and you’ve made a positive return. Google does this as well, where they’ve publically stated that they plan on up to half of the available compute resources being unavailable at any point in time. The reliability of the servers is low by design, but the availability of the APPLICATION is fantastic because of the way it’s architected. This is really just taking the existing virtualized infrastructure model and taking it to the next extreme, right?
    As to your second point, I’d argue that all enterprises are currently building out their compute resources using commodity hardware. In my opinion all of the standard-line servers from Dell, Cisco, HP and IBM all fall into that bucket. There’s a general need, in most enterprises, for different kinds of hardware tailored to specific kinds of workloads, even in those cases where everything can be run on an x86 processor.
    Please don’t mistake my lack of enthusiasm for the “revolutionary” and “game changing” adjectives being thrown around as a lack of understanding about the importance of the effort. These are the kinds of initiatives that move things forward, but it’s not something brand new that’s never been done before. The scope is impressive, and the desire to be somewhat open and transparent is admirable, don’t get me wrong. The company I work for is a huge believer in virtualizing workloads and getting the infrastructure out of the way, allowing companies to focus on their apps and users, so I believe in the message overall. I’m not knocking the Open Compute Project at all, I’m just not willing to buy into the hype generated by the pundits that Facebook is the future of IT.