UberDDR3 has come a long way! Once capped at DDR3-800, it can now push DDR3-1333 and even DDR3-1600. But reaching these speeds isn’t as simple as increasing the clock rate—FPGA constraints, speed bins, and interface bottlenecks all come into play.
In this blog, we’ll break down DDR3 speed grades, FPGA speed limitations, and master interface logic to identify what might be holding you back to reach the peak speeds. Then, we’ll put UberDDR3 DDR3-1333 and DDR3-16000 to the test with two demo projects: MicroBlaze Dhrystone Test and DDR3 Reliability Test.
Let’s see how fast UberDDR3 can go!
Table of Contents:
I. DDR3 Speed Grades Explained
DDR3 memory operates at different speed grades, defined by their data rates (MT/s), which determine the maximum theoretical bandwidth. These speeds follow the JEDEC standard:
Speed Grade | Data Rate | Memory Clock Speed | Bandwidth |
DDR3-800 | 800 MT/s | 400 MHz | 6.4 GB/s |
DDR3-1066 | 1066 MT/s | 533.33 MHz | 8.5 GB/s |
DDR3-1333 | 1333 MT/s | 666.66 MHz | 10.6 GB/s |
DDR3-1600 | 1600 MT/s | 800 MHz | 12.8 GB/s |
Key things to note:
DDR3 speed grades typically range from DDR3-800 to DDR3-1600 (though DDR3-1866 exists, it’s uncommon in FPGA designs).
The actual clock speed of DDR3 memory is half the data rate because DDR3 transfers data on both the rising and falling edges of the clock.
The bandwidth values above assume a 64-bit memory bus and represent the theoretical peak transfer rates.
When interfacing with DDR3, another critical factor to consider is the controller clock speed—the frequency at which the DDR3 memory controller operates. Unlike the memory itself, the controller doesn’t always run at the same speed as the DDR3 interface. Instead, DDR3 controllers are typically designed to run at:
4:1 memory controller → Controller runs at 1/4 the memory clock speed
2:1 memory controller → Controller runs at 1/2 the memory clock speed
So which one does UberDDR3 use? It uses a 4:1 memory controller.
From previous blog updates, UberDDR3 has always been running at 100 MHz, which means:
The actual memory clock runs 4x faster, at 400 MHz
Since DDR3 transfers data twice per clock cycle, the effective data rate is 800 MT/s, making this a DDR3-800 configuration
To illustrate this, let’s modify the previous table by replacing Memory Clock Speed with Controller Clock Speed (1/4 of the data rate):
Speed Grade | Data Rate | Controller Clock Speed | Bandwidth |
DDR3-800 | 800 MT/s | 100 MHz | 6.4 GB/s |
DDR3-1066 | 1066 MT/s | 133.33 MHz | 8.5 GB/s |
DDR3-1333 | 1333 MT/s | 166.66 MHz | 10.6 GB/s |
DDR3-1600 | 1600 MT/s | 200 MHz | 12.8 GB/s |
The controller clock speed directly affects how fast transactions occur between the DDR3 controller and the system bus (e.g., Wishbone or AXI4 in UberDDR3).
A higher controller clock speed means faster memory access, allowing the system to send/receive data more quickly.
However, a higher clock speed also makes timing closure more difficult, increasing design complexity.
For DDR3-1600, the Wishbone interface in UberDDR3 must meet timing at 200 MHz. This means any master sending traffic to UberDDR3 must also operate at this speed without violating timing constraints. If the master interface can only meet timing at 100 MHz, then regardless of UberDDR3’s ability to run at DDR3-1600, the entire system will still reliably operate only at DDR3-800 due to the bottleneck at the master interface.
II. What’s New: Running UberDDR3 at DDR3-1600 and DDR3-1333
After a long wait, UberDDR3 now supports higher speed grades! Previously, we were limited to DDR3-800, but now we can run at DDR3-1333 and DDR3-1600.
If you're wondering, "How do I get UberDDR3 to run at DDR3-1333 or DDR3-1600?"—good news! There's nothing special you need to do! Just set the top-level parameters as follows:
CONTROLLER_CLK_PERIOD
5_000 ps for DDR3-1600
6_000 ps for DDR3-1333
10_000 ps for DDR3-800
12_000 ps for DDR3-666 (lowest DDR3 speed grade, example is in Arty-S7)
DDR3_CLK_PERIOD
1_250 ps for DDR3-1600
1_500 ps for DDR3-1333
2_500 ps for DDR3-800
3_000 ps for DDR3-666
And with that UberDDR3 is all aboard in running at your desired speed grade! An instantiation template for running at DDR3-1600 is shown below:
III. Understanding the Speed Limits: What Determines the Maximum DDR3 Speed Grade?
UberDDR3 supports DDR3-1600, but will it work at that speed in your FPGA design? Several factors can limit the maximum DDR3 speed grade you can achieve. Here are the three key elements that dictate the final operating speed:
III.I FPGA Speed Grade: Not All FPGAs Are Built Equal
FPGAs have their own "speed grade," which determines their maximum operating frequency. You need to check the DC and AC Switching Characteristics of your specific FPGA.
For example, if you are using a Kintex-7 FPGA, you can refer to the Kintex-7 DC and AC Characteristic Data Sheet:

The PHY interface's maximum rate on your FPGA must be greater than or equal to the DDR3 speed grade you intend to run. For example, I have the Enclustra Mercury KX2-ST1 FPGA board, which is quite a beast of an FPGA board:

This board features the XC7K160TFFG676-2, meaning the speed grade is -2. Additionally, the VCCAUX_IO for the Mercury KX2 FPGA with FFG packages is 2.0 V. Referring to the table above, the maximum PHY interface rate is 1866 Mb/s (also highlighted in the table).
Now, let’s compare this with other Xilinx 7-series FPGAs:


Notice how, even for the highest speed grade (-3), the maximum PHY rate on the Artix-7 is only 1066 Mb/s. Meanwhile, for the Spartan-7, the maximum PHY rate for the -2C speed grade is just 800 Mb/s.
From this, we can conclude that achieving DDR3-1600 or DDR3-1333 is only feasible with a Kintex-7 FPGA. If you're using an Artix-7 or Spartan-7, you’ll likely be limited to DDR3-800 or, at worst, DDR3-666.
To further illustrate how to determine an FPGA's speed capabilities, let’s look at another board I own—one of my first FPGAs — the Arty-S7 FPGA board:

This board features the XC7S50-1CSGA324C, which has a speed grade of -1C. Referring back to the Spartan-7 table, its maximum PHY rate is 667 Mb/s, meaning it supports DDR3-666 (highlighted in yellow in the table above).
III.II DDR3 Speed Grade: Your Memory Must Keep Up with Your Speed
The next potential bottleneck when running at higher speed grades is, unsurprisingly, the DDR3 memory itself. The DDR3 module must support the target speed you intend to run.
For example, let’s revisit the Enclustra KX2-ST1 FPGA board. The highlighted component below is the DDR3 SDRAM used in my KX2-ST1 board:

After looking up the part number (MT41K128M16JT-125K), we can see that its maximum clock frequency is 800 MHz, meaning it supports DDR3-1600:

However, we previously determined that the maximum PHY interface rate for the Enclustra KX2-ST1 is 1866 Mb/s. So, what speed grade can the DDR3 memory actually run at? Simply put, it’s limited by the lower of the two rates. In this case, the DDR3 module on the Enclustra KX2-ST1 can run at a maximum of DDR3-1600. The good news is that since the FPGA itself supports up to 1866 Mb/s, we can be confident that it will meet timing and fully utilize DDR3-1600.
Now, let’s look at the Arty-S7 FPGA board. According to its reference manual, the DDR3 memory used is an MT41K128M16JT-125:

Checking the specs for this module (MT41K128M16JT-125), we see that its maximum clock frequency is also 800 MHz, meaning it supports DDR3-1600:

Great! That means we can run this DDR3 at DDR3-1600, right? Not so fast. We previously determined that the maximum PHY rate of the Arty-S7-50 FPGA is only 667 Mb/s. So, even though the DDR3 module is capable of DDR3-1600, the FPGA limits it to DDR3-666.
III.III Master Interface Speed: Keeping Up with UberDDR3
Another factor that can limit the speed grade UberDDR3 can run at is the master interface. As mentioned in the previous section, if the master interface can only meet timing at 100 MHz, then regardless of UberDDR3's ability to run at 200 MHz (DDR3-1600), the entire system will still be limited to DDR3-800 due to the bottleneck at the master interface.
In the next section, we’ll create a Dhrystone application using MicroBlaze on the Enclustra FPGA board. You’ll see how timing fails at DDR3-1600, forcing us to downgrade to DDR3-1333 to meet timing properly. This demonstrates how the master interface (in this case, the MicroBlaze system) fails to meet timing at DDR3-1600, ultimately limiting the entire MicroBlaze + UberDDR3 system to a lower speed grade.
IV. Project Demo 1: MicroBlaze Dhrystone Test
Now, for the long-awaited project demo! For this, I’ll be using everyone’s favorite—Vivado’s drag-and-drop MicroBlaze project. To benchmark performance, we’ll run the Dhrystone test. If you’re looking for a step-by-step guide on setting up a MicroBlaze project with UberDDR3, check out these previous blog posts:
In this project, UberDDR3 will serve as the cache memory for MicroBlaze, and we’ll run the Dhrystone test as the application. Below is the final block design:

A PDF export of the Vivado block design is also attached for reference:
IV.I Attempting DDR3-1600
First, let's try running the system at DDR3-1600. As shown below, the controller clock is set to 200 MHz, and the DDR3 clock is 800 MHz:

For UberDDR3, the controller clock period is configured as 5000 ps (200 MHz):

For the Enclustra KX2-ST1 constraint file, use this constraint file.
After running synthesis and generating the bitstream, the process completes successfully, but the following warning in red appears:

Oh no! The Worst Negative Slack (WNS) is -0.301, meaning the design fails timing! WNS must be positive for reliable hardware operation. But is UberDDR3 the issue?
Looking at the timing report summary below, we can see that the failing paths originate from design_1_i/microblaze_0* and terminate at design_1_i/microblaze_0/*. This indicates that timing is failing inside the MicroBlaze system, not UberDDR3 itself:

This is a perfect example of how, even when UberDDR3 meets timing at 200 MHz, the master interface (MicroBlaze in this case) becomes the bottleneck.
IV.II Adjusting to DDR3-1333
To resolve this, the easiest solution is to lower the speed grade for the entire system. Let’s now try running at DDR3-1333 instead.
The Clock Wizard needs to be updated:
Controller clock: 166.67 MHz
DDR3 clock: 666.67 MHz

Similarly, UberDDR3 needs to be reconfigured:
Controller clock period: 6000 ps (166.67 MHz)

With these adjustments, we re-run synthesis and generate the bitstream. Below is the final timing summary:

We are now meeting timing! The WNS is now +0.075 ns, confirming that the design is stable.
With timing met, we can now export hardware and create a Vitis project. In Vitis, choose Dhrystone as the application project, build it, and launch hardware. If you're unsure about this process, refer to the blog post UberDDR3 + MicroBlaze (Part 2) - Post #8, where the same Dhrystone test was run—though on a different FPGA board (Arty S7-50).
As shown below, The linker script assigns UberDDR3 for all memory regions. The serial terminal output confirms a successful run, showing:
Dhrystone MIPS (DMIPS/sec): 56.2606

Previously, in UberDDR3 + MicroBlaze (Part 2) - Post #8, the ArtyS7-50 was used, running at DDR3-666:

Comparing these two:
FPGA Board | DDR3 Speed | DMIPS/Sec |
Enclustra KX2-ST1 | DDR3-1333 | 56.2606 |
Arty-S7 | DDR3-666 | 30.6146 |
The Enclustra board achieves ~83% higher DMIPS/Sec, which makes sense since DDR3-1333 runs at twice the speed of DDR3-666.
Below is a short video demonstration of the Dhrystone test running on the Enclustra FPGA board:
V. Project Demo 2: DDR3 Reliability Test
Now, let’s push DDR3-1600 to its limits with a continuous read-write test. This test will loop indefinitely, writing and then reading back data to ensure correctness. The goal? Zero mismatches between written and read data. A summary report will be sent to the user—perhaps via UART—showing the number of mismatched reads.
The plan is to leave the FPGA running this test sequence for several days, then check back to see if any errors occurred. Given the sheer number of read-write cycles over time, even a single mismatch would be concerning. Ideally, at the end of the test, we should see zero errors!
Below is the planned architecture of the test system:

Here are the main components:
DDR3 Test Sequence FSM = This is the core logic controlling the test, interfacing with UberDDR3 via a Wishbone interface. It runs three types of memory access patterns:
Burst Write / Burst Read
Sequentially writes from address 0 to the last address, then reads in the same order.
Random Write / Random Read
Writes to random addresses by swapping address halves (forcing accesses to jump across rows instead of columns). Reads follow the same pattern.
Alternate Write-Read
Writes to an address, then immediately reads it back before moving to the next address. After completing this cycle, the FSM restarts with Burst Write.
The FSM runs indefinitely, cycling through all three test modes.
Button Debouncer = Connected to an external button. If pressed, it injects a single-bit error into the write data, incrementing the mismatch counter by 1.
Read Data Counter = Monitors read data and compares it to expected values.
Increments either the matched reads counter or the mismatched reads counter accordingly.
64-bit Timer Counter = Tracks the total runtime since the last reset.
Essential for long-duration testing, allowing us to monitor how long the test has been running.
MicroBlaze System = Handles UART communication for easy monitoring.
You might be wondering—if MicroBlaze failed timing at 200MHz in the previous Dhrystone test, won't it fail here too? The answer is no, because in this setup, MicroBlaze runs at only 100MHz. It only handles UART communication and does not affect the main DDR3 Test FSM, which is continuously running at 200MHz.
The top-level module for this test can be accessed here. If you’d like to try this out yourself, simply dump all files inside the example_demo/enclustra_kx2_st1 directory and set enclustra_ddr3_test as the top-level module:

For the constraint, we will be using this constraint file.
The MicroBlaze block design primarily contains the UARTLITE IP:

If you want to inspect the design in detail, here’s the exported block design:
Once that’s set up, just run synthesis-to-bitstream generation. There should be no errors, and timing should be passing:

Next, export the hardware and create a Vitis project. You can choose Hello World as the template—since the MicroBlaze’s role is simple:
Display the matched read data count
Display the mismatched read data count
Display the number of injected faults
Convert and display the 64-bit timer into a human-readable elapsed time (days-hours-minutes)
The MicroBlaze system will update and print this status every second. Here’s the main C code for this project:
You can also find the full C code here. Since I’m not much into software coding, I had ChatGPT help me out—especially in beautifying the status output and converting the 64-bit timer into a clean days-hours-minutes format. Hooray, ChatGPT!
Now, we can build the project, run it on hardware, and open the serial terminal. Below is a short demo video showing the system dumping the status report every second, by the end of this I also injected a fault by pushing the button:
Here’s an example where I pressed Button 0 four times over the course of almost two days:

As expected, the Injected Faults counter matches the Mismatched Reads counter—meaning that there are still zero mismatches after two days of continuous DDR3 access! Also, notice how the system has already handled ~2.2 trillion writes in that time.
🤔 What if I leave this running for even longer?
Here’s how it looked after 4 days—with a whopping 4.3 trillion writes and still zero mismatches:

By day 5, I decided to inject one more error via the push button, bringing the total injected faults to 5. At this point, the system had already handled ~6.7 trillion writes, yet the mismatches remain at zero:

5 Days, 6.7 Trillion Writes, Still Zero Mismatches!
And there you have it—a comprehensive DDR3 test running at DDR3-1600 peak speed on the Enclustra FPGA board. Even after 5 days of continuous operation and trillions of write-read cycles, there are still zero mismatches.
For now, I’m very satisfied with this result! 🥳🥳🥳
VI. Conclusion
Pushing UberDDR3 beyond DDR3-800 to DDR3-1333 and DDR3-1600 is more than just changing clock speeds—it’s a balancing act between FPGA capabilities, DDR3 module limitations, and master interface constraints. While some FPGAs can handle DDR3-1600, others might cap out at DDR3-800 due to their speed grades. Similarly, even if your FPGA is fast enough, your DDR3 memory module and system interfaces must keep up.
With the right hardware, tuning, and constraints met, UberDDR3 can now reach higher speeds, unlocking more performance for FPGA-based designs. Whether you're working on high-speed video processing, real-time data streaming, or other demanding applications, these updates push UberDDR3 to its limits—and beyond.
How fast can you get UberDDR3 running on your FPGA? Let me know in the comments! 🚀
That wraps up this post. Catch you in the next blog post!
Comments