The purpose of this
post is to clarify two "systems" terms that are usually confused and sometimes
used interchangeably: latency and throughput.
Definition of terms
Let us attempt to
define those two terms:
Latency is the time required to perform some action
or to produce some result. Latency is measured in units of time -- hours, minutes,
seconds, nanoseconds or clock periods.
Throughput is the number of such actions executed or
results produced per unit of time. This is measured in units of whatever is
being produced (cars, motorcycles, I/O samples, memory words, iterations) per
unit of time. The term "memory bandwidth" is sometimes used to specify the
throughput of memory systems.
A simple example
The following manufacturing
example should clarify these two concepts:
assembly line is manufacturing cars. It takes eight
hours to manufacture a car and that the factory produces one hundred and twenty
cars per day.
The latency is: 8
The throughput is:
120 cars / day or 5 cars / hour.
A design example
Now that these two
concepts are clear, let us apply these concepts to a problem "closer to home."
Clock frequency: 100MHzTime available to
perform the computation: 1000nsThroughput of the
device: 640 Mbits / secondWord width of each output:
Clock frequency: 100MHz
Time available to
perform the computation: 1000ns
Throughput of the
device: 640 Mbits / second
Word width of each output:
Let us translate
these requirements into latency and throughput measurements that are more
meaningful from the point of view of the hardware designer.
Latency: 1000 ns =
1000 ns * (1 s / 10^9 ns ) * ( 100 * 10^6 clock periods/ 1s) = 10^11/10^9 = 100
Throughput = 640
Mbits / s = (640 * 10^6 bits/s) * (1 word / 64 bits) * ( 1 s / 100 * 10^6 clock
periods) = 640 * 10^6 / 64 * 100 * 10^6
= 10 * 10 / 100 = 1 / 10 = 0.1 words / clock period.
The throughput could be
read more conveniently as follows: "one word every 10 clock periods"
Latency expressed in
clock periods, and throughput expressed in number of available clock cycles
between words, are parameters that a designer can use to create the desired
hardware according to the performance specifications.
Some tools do not
express the throughput in units per unit of time but in clock periods. This is
incorrect but commonly used because of convenience. Therefore some tools would
report the throughput of our communication algorithm as 10.
This Team ESL
posting is provided by Dr. Sergio Ramirez, Sr Staff Product Engineer
for the C-to-Silicon Compiler high level synthesis product.
thank you for discussing such a wonderful topic
An extrapolation on the example in this article:
An assembly line is manufacturing cars. It takes eight hours to manufacture a car and that the factory produces one hundred and twenty cars per day.
The latency is: 8 hours.
The throughput is: 120 cars / day or 5 cars / hour.
What if I build another identical assembly line to manufacture more cars? It still takes eight hours to manufacture a care on this new assembly line, however now I have two lines with twice as many employees, so my throughput is doubled.
The throughput is: 120 cars / line / day or 10 cars / hour.
This example shows that doubling the assembly line facilities (hardware) and doubling the number of employees (software) results in doubling the throughput.
What if I can reduce the speed of both assembly lines by 50% and use the employees from the first assembly line to work both lines?
It would now take 12 hours to manufacture each car and each line would produce 80 cars per day.
The latency is: 12 hours.
The throughput is: 80 cars / line / day or 6.67 cars / hour.
This example shows that doubling the assembly line facilities (hardware) but keeping the number of employees (software) constant, results in increased throughput, even though the latency is higher.
wow that's kind of insulting. I seen more systems "engineers" who fail to grasp the difference than h/w "designers".