DesignCon: The Future of Fiber Optic Communications

24 Feb 2020 • 8 minute read

At the recent DesignCon, Chris Cole gave the keynote The Future of Fiber Optic Communications: Data Center and Mobile. He was working at Finisar who got acquired, and he had just begun work at photonics startup Luminous Computing. Judging by their single-page website, they are somewhat in stealth mode, although they do say:

We are Luminous Computing. We have a blueprint to fit the computing power of the world’s largest supercomputer on a single chip. We’re specializing it for AI workloads, and we use photonics to solve all of the major bottlenecks traditional processors have to overcome. Our work is based on a decade of Princeton research, and we already have working silicon.

Chris pointed out that he was going to make "bold predictions...but don't take them too seriously." In fact, as you will see, he predicts that a lot of investment is way ahead of real demand and people will lose their shirts (or "there will be no ROI" to put it more tamely).

The networking industry is divided into two parts, datacom (or data center) and telecom. The big difference is the distance. Datacom is inside the data center, everything from server to top-of-rack, up to links between buildings. Telecom involves much longer distances, between cities or even from one side of an ocean to the other.

Let me give you a few summary bullets in case you don't have time to read all the details. Yes, there are a lot of numbers in this post.

In the data center, bandwidth is less important than being able to fan out the bandwidth more. It is all about cost, there is no premium for performance
In telecom, installed fiber has been the network capacity constraint but we are near the Shannon limit so can't get much more out of existing fiber. We're gonna need more fibers. It will all be about cost, like in datacom. No premium for performance.
In the data center, contrary to conventional wisdom, IMDD (intensity modulation direct detection) is more efficient than coherent. Coherent has no place in the data center.
Silicon photonics (SiPh) has no place in the data center because it is more expensive. SiPh only makes sense when you do a lot of photonics on the chip (like...Luminous Computing maybe?)

Datacom

Chris started by looking at datacom (Ethernet) data rates over time. During the 1990s up to 2006, data rates went from 0.1Gb/s to 1Gb/s to 10Gb/s, a factor of 10 each time. He challenged the audience as to what the next data rate would be. If it were a factor of 10, that would be 100Gb/s. But Andy Bechtolsheim, then still at Sun, said 40G, since 10Gb/s was mature but 40Gb/s would allow switch chips to be built that could process four times the data.

What were the pros and cons between the two rates?

100Gb/s (4x25G) is the conventional rate step (factor of 10) meaning fewer deployments for end-users. 25G NRZ technology will lead to lower long-term cost.
40Gb/s (4x10G) is mature low-risk technology available now. 40G has about 3X the radix of 100G for a 1.28T switch ASIC (it can attach about three times as many connections).

Instead of making a decision, the IEEE adopted both rates, since 40G was identified as critical to high-volume, near-term data center deployment (and an admission that 100G was nowhere near ready).

So now it is 2008, what is the next data rate? Does it go up by about two or by four?

The obvious answer is 400Gb/s. Pros and cons:

400Gb/s (4x100G) is the conventional rate step, fewer deployment steps, the 100G PAM4 technology results in lower long-term costs.
200Gb/s (4x50G) leveraged the by-then mature low-risk low-cost 25G now. 200G has 2X radix vs 5\400G for 12.8T switch ASIC. So it is the right step for servers after 100Gb/s.

"Tell me if you've heard this before," Chris said. The IEEE adopted both rates after 200G was identified as important for mobile applications in China.

The reality is that, except for some video, you don't need a fat pipe, you need your pipes to fan out more.

So what is going on in the big four in 2019:

AWS: 400G DR4 broken out to four 100GbE in the form of 2x200G modules, with 2X400G modules the next step
Google: Shifting from 100GbE to 200GbE
Facebook: New high-density 100GbE switch fabric for 4X capacity, with 200GbE the next step
Microsoft: Will deploy 400GbE inside data centers after 400XR availability
Same for the China cloud guys

So there are no clear plans to deploy 400GbE for some time! Why are there no near-term plans for 400Gbe?

Still rather early in the 100GbE life-cycle
12.8Tb switches provide only 32 ports of 400GbE and you can't scale a switch fabric with only 32 ports
No practical means to build a 128-port 400GbE switch such as Facebook and Arista have announced
Concerns about availability of 400G optical modules
The expectation that switches and modules with 100Gb/s SerDes will result in efficient and economical 400GbE

Chris's opinion is that there is huge industry investment into first-gen 400GbE (with great expectations that this will be the next high-volume datacom rate) but it will have no ROI because first-gen 400GbE optics are small volume, primarily for telecom, and won't get deployed in volume. 200GbE is an "interim" step to 400GbE, just like 40GbE was an "interim" step to 100GbE.

400GbE will only be high volume when 100Gb/s lane SerDes is mature, 7nm CMOS PHYs are mature, and TX and RX 56GBaud optics have excess bandwidth. The current predictions are that there will not be 1M units of 400GbE shipped until 2023.

So what's the next rate? Is it 2X or 4X? Pros and cons?

1.6Tb/s: 4X is the conventional rate step, fewer deployment steps, 800GbE is an "interim" step to 1.6TbE
800Gb/s: 100GBaud PAM4 technology will be mature, low-risk, low-cost.800GbE has 2X radix vs 1.6TbE

Chris says that this is the same obsession with bandwidth that drove 100GbE and 400GbE, with the same "fantasies" about shipment volumes as for 100GbE and 400GbE. For example, at the IEEE 802.3 Ad Hoc meeting in January (yes, a few weeks ago) Dell'Oro Group using "actual data" forecasts the first million 400GbE by 2020

This is wrong by at least a factor of 100

He predicts the IEEE will split the baby again and adopt both 800GbE and 1.6TbE. Suppliers who develop first-gen 1.6TbE transceivers will have no ROI (like with 100GbE and 400GbE). The first million units shipped for 800GbE will be 2028, and for 1.6TbE will be 2030. We are talking about looooong timescales.

Coherent in Telecom

Installed network fiber has been the network capacity constraint. As a result, increasing spectral efficiency (SE) has been the focus of R&D but we are near the Shannon limit so escalating costs means meager SE gains. If bandwidth and signal/nose are limited, the only alternative is increasing channels, meaning parallel fibers. So new fiber will need to be installed to increase network capacity. The cost of the fiber is minor, it is all the cost of installations. So there is no penalty for installing massive amounts of fiber, most of which will initially be unused. It's the same cost as only installing what is needed. SE becomes a minor performance tweak when fiber is plentiful. This is the opposite of current R&D focus and conventional thinking.

Chris's telecom predictions:

Networks will be fiber-rich
100G coherent will dominate long-haul
400G coherent will dominate metro
There will be no premium for performance (such as increasing SE)
It will all be about low cost (so ugly)
The construction and fiber guys will make money for a while

Coherent in Datacom

I'm not going to attempt to summarize this section because, frankly, I didn't understand it. There were about 20 slides of equations presented in about five minutes. I put one of them here just to show you what we are talking about. This is actually the last slide of the group that shows that IMDD is better than coherent over data center distances. The reason is that in the data center you only care about signal-to-noise ration (SNR) and not things like link adaptation and fiber impairment compensation (important for really long links).

Chris's conclusion: Conventional thinking is that coherent will replace IMDD for datacom links in the data center just like it did for telecom links. The point of all those equations was to prove, to people who are expert in the area, that this is wrong. For most data center links, IMDD has better SNR than coherent. So coherent has no place in the data center.

Silicon Photonics

What's important for data center optics? Cheap lasers, cheap SNR (low-loss), cheap assembly and packaging.

What does SiPh deliver? Lousy light amplifier (Si lasers are seriously lousy since the LA stands for light amplification). High loss. Equivalent packaging costs (so two out of three worse, one on a par).

So that is myth 1, that SiPh is low cost. It is expensive when all the costs are properly accounted for.

Myth 2: SiPh design is just like CMOS ASIC design. But the revenue of the two leading EDA companies are bigger than the entire revenue for the datacom optics industry. The true cost of just developing PDKs for advanced CMOS nodes is comparable to the entire R&D budget of an optical transceiver vendor. Counterpoint: It is true that with proper investment and effort, SiPh tools (Cadence and Lumerical have them) do give good results.

Silicon photonics predictions:

Quad-channel SiPh transceivers fundamentally have no advantage over conventional quad receivers
There are not that many optical components to integrate
The current stampede of 400G DR4 QSFP-DD SiPh transceiver designs from Intel, Cisco/Luxtera, Cisco/Acacio, Elenion, and other smaller companies will result in no ROI (Finsar, Chris's former employer, dropped out of this market when they came to the same conclusion).
"Me too" products don't bring success
SiPh has to deliver performance enabled by large-scale photonic integration not possible with conventional optics
SiPh has to be about value, not price

Summary

In the data center, conventional wisdom is that bandwidth is the driver, but it really is cost and fanout.

In telecom, conventional wisdom is to focus on increasing spectral efficiency, but more fiber is needed and once there is more fiber, nobody will pay for spectral efficiency.

Coherent optical has no place in the datacenter.

Silicon photonics is only economic when you do a lot of photonic stuff on the chip and do things that conventional optics cannot do. Otherwise, it is too expensive to design and manufacture.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.