At the recent Linley Cloud Hardware Conference, the keynote was given by Andy Bechtolsheim. Even if you can't quite remember who he is, you've probably heard his name. He is currently founder and chairman of Arista Networks. But he was a co-founder of Sun Microsystems. He is also famous for giving Larry and Sergei a check for $100K payable to Google before the company had even been founded. The story that they named the company Google since that was the only way they could cash the check is apparently apocryphal. In between Sun and Arista, he founded Granite Systems and then stayed at Cisco for several years after the acquisition.
Andy's keynote was titled Datacenter Networking Market Transitions. He divided his talk into 5 sections:
Merchant silicon (as opposed to proprietary silicon) is increasing and is now up to 75% of lines shipped. Back in the dim and distant past of 2008, only the left chips of the switching were merchant, the rest of the stack (spine-switching chips, edge router, core router, optical transport) were all proprietary. By 2016, the entire stack was available from merchant suppliers and is gradually displacing proprietary solutions.
The industry has many notable firsts:
There has been an astonishing size and power reduction. 55Tbps used to be a huge double-wide rack, now it is a couple of normal rackmount boards and 1/36th of the power. The graph below shows the bandwidth improvement of merchant silicon.
The only reason to build proprietary chips, other than merchant chips not being available, was for differentiation. But with standard datacenters and especially the cloud, no one wants differentiation anymore. Proprietary silicon just doesn't have enough volume to make semiconductor economics work, and it gets worse with each process generation. So it's the golden age for merchant networking silicon.
Merchant networking chips are coming out from behind. Even today, most are 28nm, so there are at least three generations left to run (and that doesn't count 20nm or 10nm as generations---just 16nm, 7nm, and 5nm). Networking really can make use of all the transistors on a die and the silicon is such a small part of the overall system cost that even if the cost per transistor doesn't decline, it doesn't matter. A 5nm die has 30X as many transistors as the 28nm die shipping today.
Going forward, the cost per transistor may go up, but the cost per bandwidth, which is what matters in networking, will go down.
Packaging technology is also an issue since we will eventually read 25Tbps on a single chip eventually. Ball grid arrays (BGA) need to get to 3600 balls on a 60mm square package.
But SerDes is a growing part of overall chip power, which is another reason to fully integrate the optics...
Today, there is a shortage of 100G optics. There has been a rapid shift from multi- to single-mode optics. But there were the wrong standards, the wrong market predictions on the ramp, so nobody predicted what actually happened. The transition to single-mode started in the cloud, which is where most of the consumption has shifted anyway. There has been a lack of cost-effective standards (the IEEE designed for 10km but that is too expensive for shorter runs). But the big mistake was the predictions of how the market would switch. To show how wrong everyone was, Andy showed analysts predictions for the 40G to 100G transition made in 2014, on the left. Essentially, the prediction was that nobody would use it. On the right are the 2017 predictions. I suspect that if there was 2018 on there it would show almost everything transitioned to 100G.
The actual volumes of 100G optics, which is why we now have a shortage, was 50,000 in 2015, 1M in 2016, a forecast of 5M this year and 10M next. That's quite a ramp.
Old technology was different. There was a learning curve that lasted several years. So the go-to-market strategy was to price it high for upfront profit and then gradually lower the price. But the old model doesn't work anymore, because the technology won't be deployed at all if it is not cheaper; in the cloud, any deployment is massive. The moment it is cheaper, nobody wants the old technology. The transition is literally 6 months.
2017 is the year that 100G will pass 40G in volume shipments. Next is 400G, where first silicon will be this year. Then it is one year to build products around the silicon and get them to market. So the first products will be late 2018, with volume ramp in 2019 and 2020. It will ramp fast (like 100G) if it is cheaper than 100G, and it won't ramp at all if it is not.
For 400G, Andy's prediction is 1M in 2019, 4M in 2020. And analysts predict zero. But be aware that no single technology solves all problems. There is a tradeoff between the cost of optics and cost of fiber that obviously depends on how long the fiber is; links between datacenters in the same metro area can be up to 100km.
800G will be the next speed, in Andy's opinion. But IEEE standards are coming too slowly, as compared to market demand. 400G took four years, but 800G designs start this year. 800G will be similar to 400G, but with twice the speed per lane. Obviously, for interoperability, it requires multi-vendor agreements, since IEEE will not make it in time, so there is an MSA (multi-source agreement) assembling with all the interested parties.
What about power? Another reason for 800G is that it's really hard to process packets at really high rates. SerDes uses 30-50% of the power, and the only way to eliminate it is to integrate the optics with the switch silicon on an MCM in the future. But that won’t happen at 800G since everyone knows how to do it without.
Silicon photonics? Maybe a vendor will do at 400G. But 100G ramped by a factor of four this year and two last year, so it has been hard to get into volume production and keep on the ramp. The same vendors that could ramp to 100G will be the same ones to ramp to 400G.
Aren’t datacenters with current topology pushing problems to the chip industry? Server architectures haven't changed and are universal now. They are not about to change.
How much of all this is software? 95% of Arista is software. The hardware team is tiny. Delivering system level products is all software. It is 10M lines of code (not counting Linux underneath). Arista has 1000 engineers doing this.
Splitting? Each 400G on the chip can be split to 4x100G, 1x400, 2x200 etc.
Flex Ethernet? This is a separate initiative from IEEE to enable programmable bit rates especially across wide areas; could dial down the port speed to max out the bandwidth. Sounds a good idea. But it turned out to be a very low volume market and it didn’t make it into the mainstream market. Lack of volume made it a very expensive chip for a very small market. Mainstream 40G everyone is embracing. But with Flex Ethernet, nobody put it on the main chip.
What about blades servers? Nobody in the cloud is doing blades. Blades are declining. Blades were an enterprise form factor, but enterprise is going away.
To wrap up, Andy reiterated his five main points: