
A Primer on Sizing Hardware FIFOs
FIFOs (First-In, First-Out) are point-to-point elastic storage elements that are very commonly used in embedded systems to pass data across an asynchronous interface in a very efficient manner. In this context, asynchronous simply means that the two ends of the FIFO are behaving independently save for some initial signaling to initiate a transfer.
A typical question that silicon or FPGA architects are often faced with is how big should they make a FIFO? In this article, we will cover the theory, the practice and an easy-to-use open-source Python simulator you can use to experiment with scenarios in order to answer this question in a systematic way.
The Theory
FIFOs have two ports, a read port and a write port. The write port pushes data into the FIFO and the read port pops data from the FIFO. This is a simple consumer/producer pattern.

The FIFO has three main purposes:
- Provide sufficient storage to account for the production/consumption rate differences between the read and write domains over a finite transfer size
- Provide sufficient storage to absorb short term fluctuations (such as bursts/stalls) between the two domains
- Account for startup latencies associated around the start of the read domain relative to the write domain operations
The first point is the primary factor in determining the size. The second point is a peak versus average concept and may end up being the factor which determines the size. The third point relates to the low-level implementation details of the FIFO controller and latencies associated with initiating the read port domain after the write port domain has signaled its readiness.
Let us focus on the primary aspect first. Consider the producer/consumer bandwidths of the read and write ports expressed as relative values.
- A write bandwidth BW(write)
- A read bandwidth BW(read)
The units are not important (e.g., MB/second). It is simply a rate in normalized units of transfers per unit time. All that matters is that the read and write bandwidths are in the same units.
Now, let us consider a finite transfer size as the payload expressed in the units of the minimum transfer size. We can refer to this as PL(size). The formulaic approach to determine the required FIFO size based on these three parameters is as follows:

This is a relatively straightforward rate-ratio calculation and follows the simplest queueing theory where both the item sequence (producer) and server (consumer) follow fixed rates. Here we perform the calculation from either the read or write port perspective and pick the worst case scenario.
- In the case where BW(write) > BW(read), we need a FIFO large enough to capture the accumulation of data that occurs over the PL(size) interval. This accumulation rate is BW(write)-BW(read)
- In the case where BW(read) > BW(write), we need the producer to buffer the full calculated FIFO size before the consumer side is initiated. The contents will drain at a rate defined by BW(read)-BW(write) over the PL(size) interval
So, we are all done, right? sort of …
The Practice

How exactly do we choose the various parameters for the equation? For example, what is the payload size and what are the bandwidths?
In very simple systems, these may be completely obvious. Things get complicated when we have to consider the following system characteristics:
- The consumer and producer relative bandwidth ratios are not constant over time and may follow behaviors that look more like statistical distributions.
- The consumer and producer experience short term fluctuations causing bursts or stalls. Each side may be subject to variations in short term throughput due to access of a shared memory sub-system.
- The payload size is very large or infinite. This can occur in display pipelines for example, which for all intents and purposes can have infinitely sized payloads.
It turns out that the previous equation still holds true, with the caveat that we now need to analyze multiple solutions of this equation to comprehend the different scenarios. Besides the average bandwidths which are defined over the PL(size) interval, we also need to consider peak bandwidths over a subset of the interval.
To illustrate this distinction, consider a system where the consumer and producer bandwidths are equal. This would suggest a FIFO size of zero, but it is doubtful that this would work in practice. All it would take is a few stall cycles on a bus somewhere in the producer or consumer domain, to require a few entries. Here, the peak case (temporary stalls) determines the FIFO size.
Lastly, there is some accounting required in relation to the specific implementation of a given FIFO controller, especially when dealing with FIFOs which cross hardware clocking domains (known as asynchronous FIFOs). It is beyond the scope of this article to get into these details, but typical implementations may require several clocks worth of uncertainty that need to be considered.
The Simulation

Using the formulaic approach is a necessary step, but engineers are pragmatic humans that like to see things working, and there is nothing more pragmatic than simulation. Also, for the cases where the bandwidths are not constant-rate, a random variable from a discrete statistical distribution model must be used.
Typically, simulations for hardware are done in an HDL simulator at the RTL abstraction level. In my mind, this is a downstream step, and initial architectural work requires a simpler approach where the design space can be explored quickly. This is typically done in a high-level modeling environment (such as SystemC), but for the purposes of a simpler demonstration I wrote a basic open-source FIFO simulator in Python which does leverage a random variable to generate consumer and producer rates.
The value of a simulation is that it can model short term behaviors with randomness.
Randomness provides a check and balance to the formulaic approach.
The simulator can be found here with full details on how to run it:
https://github.com/sebastian-ahmed/fifo_tools
The simulator also provides the formulaic calculation (which can be run independently of the simulation step)
The simulator is quite simple, with the following mechanics that are worth noting:
- The consumer and producer are run as independent threads operating on a shared thread-safe FIFO object via a simulation kernel thread which manages a simulation event queue with a configurable quantum. The quantum allows control of simulation speed vs short-term accuracy.
- The effective average bandwidth ratio is based on a desired statistical distribution configured from the input read and write bandwidths. The simulator uses a binomial distribution with a single trial and a target probability of push or pop operations being a function of bandwidths.
Since the operations of the consumer and producer threads are based on sampling a random number generator source with a target statistical distribution,
the resultant bandwidth ratio is never perfect, and this is a good thing.
When running the simulator a few things to note:
- When BW(read) > BW(write), the initlevel parameter should be set appropriately to a level that is at the target level of the FIFO
- When BW(write) > BW(read), always set an initlevel greater than the default(1). This is because of the randomness of the producer/consumer threads in a short interval will usually look like complete noise relative to the average bandwidth ratio. As such, underrun errors may occur during the simulation
- Running with the verbose option shows all FIFO operations in real-time
- The simulator is better suited to modeling larger payloads because the random distribution becomes more accurate over a larger sample set
Below is a screen capture of running the simulator in a shell:

Note how the formulaic approach yielded a depth of 91 and in the simulation, a peak level of 114 was hit. Part of this is because an initial level of 16 entries was configured, but it also reflects short term noise in the read/write ratio bandwidth which the formulaic approach cannot reflect.
Summary
- The initial sizing of hardware FIFOs can be done in a formulaic way as long as the consumer/producer systems can be described for the appropriate scenarios for both average and peak situations.
- Simulation with random variation within a statistical distribution will provide further insights in order to check formulaic approach. Although a relatively simple simulator has been provided as part of this article there are far more advanced high-level modeling techniques.
- Each FIFO controller design is different, and for smaller FIFOs, some of the low-level uncertainties and latencies must be accounted for in the sizing.
- The final proof is always in the pudding of a system-level RTL-based simulation, but it helps to have a high-confidence first target to avoid thrashing in the design space.