Open Server Summit: How to Install 5,000 Servers Per Day

19 Apr 2016 • 6 minute read

There are only a few end markets for semiconductors that really drive the technology. Mobile, obviously. But mobile also drives cloud datacenter deployment since our smartphones increasingly split their functionality with big datacenters (the iPhone's Siri, for example, runs mostly in the cloud). There is a move towards hyperscale datacenters and, increasingly, only the largest users can justify building their own datacenters rather than renting capacity. Netflix, for example, has run all their video pumps on Amazon for a long time but recently moved everything over, including internal systems like HR.

So to get an idea of what is going on in this space, I went to some of the Open Server Summit last week. I find the most impressive thing about datacenter deployment, when I hear the details, is the sheer scale of operations.

One of the keynotes was by Dolly Wu of Inspur on Implementing Rack Scale Architecture Using Open Hardware Designs. Inspur is a Chinese company that until recently was mostly installing datacenters in China. In fact, in just the last two years they have installed 285 data centers in China. They install about 1M servers per year. Last year they opened a manufacturing site in Fremont that builds 10,000 servers per month.

Open servers are server designs that have been put in the public domain. About five years ago Facebook found it was too slow to install traditional servers so they created their own hyperscale platform and put it in the public domain as the Open Compute Project. Their reasoning for doing this was apparently that it is to Facebook's advantage if lots of other people are using the same components since the volume helps drive down prices for Facebook. Microsoft joined in 2014, Apple in 2015 and just last month Google contributed their 48V rack design.

It is interesting the level of cooperation that is going on between these companies who compete fiercely in business but have a lot of common issues in how to best build datacenters. It reminds me a little of the pre-competitive research cooperation that goes on in the semiconductor industry at places like imec. As Urs Hölzle, senior VP of technical infrastructure at Google, said:

Google is currently working with Facebook to create a rack standard that suppliers could use, in hopes of pushing it into mass production. If Google’s contribution is accepted by OCP and the standard is developed, Google will deploy racks based on that standard in its datacenters, and there are indications that Facebook would do the same.

Deployment of servers started in the obvious big names like Facebook, but increasingly companies like Goldman Sachs and other financial institutions are deploying OCP server datacenters.

So what are these servers like? OCP open rack 3.0 is what Dolly called "vanity-free" hardware. There are no covers, the servers are stripped of all components not needed in a datacenter, such as VGA. Everything is front serviceable and all the I/O comes out the front of the rack. Historically, racks have been 19" but the new standard is 21" with a height of 1.89" to allow better cooling. Power is all centralized in a power bank with no separate power supplies. Everything can be serviced tool-free.

FB recognized that using OCP servers reduces power by 38% and cost of ownership by a similar amount (the power required to run a datacenter over its life costs more than the equipment). They used to require one technician for every 5,000 servers. Now it is one technician per 25,000 servers. By using open platform they are saving $2B per year. Microsoft reported similar numbers with 50% reduction in maintenance and faster deployment.

In China there is an open server called Scorpio where Baidu, Alibaba and Tencent, who are colloquially known as BAT from their initials, are working together using the Inspur smartrack. It is not the same as OCP but has a lot of similar characteristics. It is a hybrid between the Facebook OCP and the Microsoft OCS with lots of choices for server and storage nodes. It is being adopted not just by BAT but also China Telecom and other large institutions in China. It has a 1.75" standard rack height (so slightly higher density). It has 21" width like OCP, but the other width of the rack remains at 24", like any conventional rack.

There is centralized power (so no power supplies on the server boards) and a 30 fan bank build into the back of the rack so no need for fans on the servers themselves. The fans can vary in speed, with high speed for compute and low speed for cold storage. The failure rate is also reduced by 50%. Standard is 12.5KW per rack power, but they have some customers going up to 50KW per rack.

The future is to move more and more towards software-defined datacenters with pooled resources that can be switched in depending on requirements.

To give an idea of the scale at which these companies operate, Alibaba is a major adopter of the ODC rack. November 11th is a sort of equivalent of black Friday in the US. On that single day Alibaba did $14.5B of transactions. Obviously they need infrastructure that can handle this type of volume without fail. Without fail means not that no components fail, with so many some fail every day, but the system is architected to handle failure gracefully.

Baidu, which has over 80% share of search in China, has been working closely with inspur. It used to be that a Baidu datacenter could install 300-500 nodes per day, but with ODC it can install 5,000 nodes per day, an increase of 10-15 times.

Later in the day, Linley Gwenapp of the Linley Group talked a bit about ARM^®-based servers. He started by pointing out that hyperscale data centers have different requirements and it is not all about compute power. For example, cold storage requires very high I/O bandwidth but little compute, in-memory databases need huge memories but only moderate compute power.

His checklist for success with ARM in the datacenter was:

match the performance of Xeon E5, the most commonly deployed server processor (per core and per socket)
match or exceed E5 power efficiency
excellent cache performance and multicore scaling
proven reliability, availability, and serviceability (RAS) feature set
exceed E5 in one or more areas (e.g. memory, accelerators)

He was followed by Paramesh Gopi, the CEO of Applied Micro to discuss their Xgene family. He started off pointing out that it is increasingly widely deployed in datacenters, especially in China, and also in a wide range of storage, networking and compute products. There is a very steep ramp on performance. The original Xgene had a SpecInt performance of about 80 but next year the Xgene 3XL will be well over 1000. This places them well to satisfy Linley's checklist.

Cadence was at the exhibition at the summit. Inside all these servers are PCIe interfaces, DDRx and many of the other functions that Cadence has in its IP portfolio. I wrote about one of these, PCIe 4.0 yesterday. Cadence and Mellanox were demonstrating interoperability despite the fact that the standard will not be finalized until next year. But success in these markets doesn't allow for waiting until the standards are finalized. Parts will be shipping in volume by then.

Previous: "Interoperability is the Only Way to Prove Standards Compliance"

Next: Ann Winblad Masterclass