top of page
Search

UberDDR3 on Lattice ECP5: Expanding FPGA Support - Post #15

  • Writer: Angelo Jacobo
    Angelo Jacobo
  • 23 hours ago
  • 9 min read

UberDDR3 now running on Lattice ECP5!

In the last blog post, we successfully ran UberDDR3 demo projects on OpenXC7 — the open-source FPGA toolchain for AMD 7-series FPGAs. Today, the open-source journey continues! This time, we’re bringing UberDDR3 to the Lattice ECP5 family.


For this update, we’ll be using the small but mighty OrangeCrab — an open-source hardware board with the Lattice ECP5 FPGA that's become infamous in the FPGA community for packing serious power into a tiny form factor.

Table of Contents:


I. Progress Update: Closing In on the Final UberDDR3 Goals

Over the past year, we’ve added a ton of features and improvements to UberDDR3 as part of the NLNet project, which we first discussed in Blog Post 1.


To recap, here are the main goals for UberDDR3:

  1. ✅ Launch a website for UberDDR3 updates and knowledge sharing (this was when OpenIPHub officially started! 😊)

  2. ✅ Integrate an AXI bus interface on top of the existing Wishbone interface.

  3. ✅ Implement an in-line ECC feature for error correction capability.

  4. ✅ Expand support for data bus widths from 8 bits to 72 bits.

  5. ✅ Implement self-refresh mode for lower power consumption.

  6. ✅ Add dual-rank interleaving support for dual-rank DDR3 DIMMs.

  7. ✅ Enable on-the-fly timing parameter configuration based on the inserted DDR3 DIMM type.

  8. ✅ Optimize the design to support clock speeds exceeding 400MHz.

  9. ✅ Port the controller to an open-source FPGA toolchain.

  10. 🔄 (Ongoing) Port the controller to other FPGA architectures like Lattice, Gowin, and more.

  11. 🕒 (To be done) Deploy on a CI system and provide ready-to-use DDR3 controller bitstreams for different boards.

  12. 🕒 (To be done) Implement a benchmarking SoC to measure read and write performance.


We are now working on the 10th goal — the third-to-last milestone.


This project has been such a fun and rewarding journey, and while it’s almost time to wrap it up, I’ll save all my final thoughts and reflections for when everything is truly complete. 😊


Until now, UberDDR3 has mainly focused on AMD 7-series FPGAs — largely because of their strong market presence and large community support.


But it’s time to branch out! For this next step, I’m focusing on porting UberDDR3 to a different vendor: Lattice ECP5.


II. Challenges in Porting UberDDR3 to Other FPGA Vendors

When it comes to the controller portion of UberDDR3, things are relatively simple. Since the controller is written in Verilog, porting it to another FPGA vendor isn't too difficult.


However, the real challenge lies in the PHY component of UberDDR3. Let’s revisit the diagram below for clarity:


The PHY component (ddr3_phy.v) uses Xilinx-specific primitives such as IOSERDES, IODELAY, and IOBUF. These primitives are key to enabling proper serialization and deserialization of incoming/outgoing data as DDR (using IOSERDES), delaying data transmission (with IODELAY), and controlling the tristate line of the data bus (using IOBUF).


Since our goal is to run UberDDR3 on the Lattice ECP5 using the open-source toolchain (yosys + nextpnr-ecp5), we need to ensure that the primitives we’re using are supported by this toolchain. Here’s a list of primitives supported by nextpnr-ecp5.


Additionally, the ECP5 High-Speed I/O Interface Technical Note from Lattice provides more details on these primitives.


II.I Issue: No 8:1 IOSERDES Available for Lattice ECP5

After diving into the details, I realized that Lattice ECP5 doesn’t offer an IOSERDES with an 8:1 configuration (IOSERDESE2 in Xilinx 7-series FPGA). The 8:1 setup is crucial to support the 4:1 DDR memory controller. DDR runs at double the rate hence 8:1 is needed. The closest alternative is the Lattice primitive ODDRX2F, which supports a 4:1 configuration — but this is only suitable for a 2:1 memory controller.


This presents a challenge because the UberDDR3 controller is hardcoded to run at 4:1. Honestly, I never imagined I’d need to adjust it to work at 2:1 when I first created it! 😅


So, what’s the solution? One idea I considered is creating a "soft" IOSERDES in Verilog that could run in 8:1 mode. While this approach is theoretically possible, there’s a catch: it would limit the timing slack of the PHY, meaning it would only be able to pass timing at lower frequencies.


To make it work, we’ll have to reduce the DDR3 clock frequency to allow the "soft" IOSERDES to meet timing constraints.


II.II Solution: Leveraging DLL Off to Run at Lower DDR3 Frequencies

According to the DDR3 specification, DLL Off mode allows the memory to run at frequencies below the standard DDR3 clock range.


The JEDEC spec typically sets the minimum clock frequency at 333 MHz when DLL is on. However, in DLL Off mode, there is no defined minimum clock frequency. The only restriction is that the maximum clock frequency is 125 MHz, corresponding to a period of at least 8 ns:


So, by taking advantage of the DLL Off feature in DDR3, we can lower the clock frequency, enabling the "soft" IOSERDES to meet timing requirements for the PHY at a lower speed.


III. Architecture of the Lattice ECP5 PHY

Let’s now dive into the architecture of the Lattice ECP5 PHY for UberDDR3:


On the left, you can see the 8:1 ISERDES and OSERDES, which I’ve labeled as “soft” to clarify that this is a Verilog-coded solution. Here’s how it works:

  • OSERDES (Output Serializer): The OSERDES takes in 8 bits (D1 to D8) from the controller clock domain and serializes them 2 bits at a time. This serialization is controlled by a MUX, with the operation driven by a mod-4 counter running on the DDR3 clock (which is at 4x the controller clock frequency). For example, when the counter is zero, D1 and D2 are sent to ODDRX1F. The ODDRX1F then further serializes these bits across both the rising and falling edges of the clock, effectively enabling double data rate (DDR). In essence, the 8 bits are spread across 4 clock cycles, resulting in an 8:1 ratio.

  • IDDRX1F (Input Deserializer): The IDDRX1F receives the DDR data at the DDR3 clock speed and converts it into SDR (single data rate) 2-bits output, which is then processed by the MUX. The MUX deserializes the 2 bits and shifts the data into the controller clock domain. This also includes a bitslip logic that allows us to shift the data sampled by the controller clock. This component effectively takes a single incoming bit and converts it into 8 bits over 4 DDR3 clock cycles, once again achieving the 8:1 deserialization.

With these "soft" ISERDES and OSERDES, we can now assemble the basic components of the ECP5 PHY. The diagram on the right illustrates the setup:

  • OSERDES for Command Serialization: The PHY serializes 4 command sets from the controller, sending them to the DDR3 memory in SDR format. For instance, D1-D2 is tied to cmd[0], D3-D4 to cmd[1], and so on. These serialized commands are then fed directly into the DDR3 command pins (cs_n, cke, ras_n, etc.).

  • OSERDES for Data Serialization: The PHY serializes 8 data sets from the controller and sends them to DDR3 in DDR format (double data rate). This data is then directed to BB (Bidirectional Buffer).

  • BB (Bidirectional Buffer): The BB receives DDR data from the OSERDES and connects to the DDR3 data pins (io_ddr3_dq[15:0]) during write operations. During read operations, the incoming data is passed to DELAYG.

  • DELAYG: The DELAYG component delays the incoming read data to ensure that the transitions occur away from the clock edges, making it easier for the ISERDES to sample the data correctly. The delayed data is then deserialized by the ISERDES.

  • ISERDES (Input Serializer): The ISERDES takes the delayed read data and deserializes it, sending it back to the controller.


Additionally, the DDR3 clock is directly used as the DDR3 clock input for the DDR3 module, and i_ddr3_clk_90 (which is the DDR3 clock delayed by 90 degrees) is used as the DQS output during write operations.


With these components, we've now outlined the basic architecture for the ECP5 PHY.


IV. Modifications to the DDR3 Controller:

Let’s explore the necessary modifications to the DDR3 Controller in light of the DLL Off mode. Here's a snapshot from the JEDEC DDR3 Standard, which highlights three important points about DLL Off mode:

  1. Entering DLL Off Mode: DLL Off mode is activated by setting MR1 A0 to high.

  2. Support for Specific CL and CWL Settings: This mode is essential when configuring CL=6 (CAS Latency) and CWL=6 (CAS Write Latency).

  3. Reduced Read Latency: With DLL Off mode, read latency is reduced by one cycle compared to DLL On mode. For example, with CL=6, the actual read latency will be 5 cycles instead of 6.


Additionally, since we’re now using a different PHY, the latency in IOSERDES has also changed. This differs from the latency observed when using the Xilinx PHY, and must be adjusted accordingly in the controller to maintain proper timing.


V. How to Use the ECP5 PHY in UberDDR3?

To use the ECP5 PHY with UberDDR3, refer to the Verilog files located in the rtl/ecp5_phy/ directory:

  • ddr3_phy_ecp5.v: The main Verilog module for the ECP5 PHY.

  • iserdes_soft.v: The Verilog module for the soft ISERDES instantiated in the ECP5 PHY.

  • oserdes_soft.v: The Verilog module for the soft OSERDES instantiated in the ECP5 PHY.


The main controller module will still be the ddr3_controller.v.


The top-level module that instantiates both the controller and PHY is ddr3_top.v.


In the ddr3_top module, both the Xilinx 7-Series PHY (ddr3_phy) and the Lattice ECP5 PHY (ddr3_phy_ecp5) are instantiated. To enable the ECP5 PHY, define LATTICE_ECP5_PHY.


You can add define LATTICE_ECP5_PHY either directly in the ddr3_top module or as part of the synthesis command. The latter approach is the method I used. For example, in the Makefile for the OrangeCrab ECP5 example demo:

Notice how the "-D LATTICE_ECP5_PHY" flag is added to the read_verilog command for synthesis.


VI. Getting UberDDR3 Running on the OrangeCrab FPGA

Now, let’s dive into running UberDDR3 on hardware! For this demo, we’ll be using the OrangeCrab ECP5 FPGA board:


Below is my OrangeCrab board with a coin for size comparison:


If you’re not familiar with it, the OrangeCrab is a compact, pocket-sized development board in the Adafruit Feather format, equipped with a Lattice ECP5 FPGA and DDR3L memory. It’s the first time I’ve worked with this board, and honestly, it's the smallest FPGA board I’ve ever used!


For the demo project, we will implement the usual UberDDR3 example demo, which has been explained in previous blog posts.


The mechanism is quite simple: the green LED will turn on once UberDDR3 completes calibration. This process involves passing the Built-In Self Test (BIST), which includes:

  • Burst write-read

  • Random write-read

  • Alternating write-read across the entire DDR3 memory address space.


The clock frequencies are crucial to get right. For this demo, I’ve set up a clk_wiz_pll as the PLL module to generate the clocks. This PLL module is created using the ecppll command:


The DDR3 clock is 160 MHz, while the controller clock is 40 MHz.


These are the highest frequencies I was able to successfully run for the demo on the OrangeCrab. While the JEDEC DDR3 specification limits the maximum DDR3 clock for DLL off to 125 MHz, the setup still works fine at this higher frequency, so I decided to keep it as the clock setting for the example demo.


The process of synthesizing the design and generating the bitstream is managed through the Makefile.


To get started, ensure you have the OrangeCrab FPGA board—I'm using the 85F model, which I purchased here. You'll also need to set up the necessary toolchain and verify that you can run the basic blink project, which you can find here.


To follow along, make sure of course you have the OrangeCrab FPGA, mine is the 85F I bought from here. Then you have to make sure you have the necessary toolchain, and you can run the basic blink project.


Now, let's get started!


  1. First, clone the UberDDR3 repository:

  1. Change directory to the example_demo/orangecrab_ecp5.

  2. Connect your OrangeCrab FPGA board while pressing btn0.

  3. Run the following commands to compile and upload the bitstream file:

  1. Observe the LED behavior:

    The LED should initially light up red, indicating that the system is starting.

    After approximately 3 seconds, the LED should turn green, signaling that calibration and BIST have successfully completed.


Helpful Tip:

Before you run "make dfu" to load the bitstream, make sure the OrangeCrab is in bootloader mode. To do this, power on the board while holding down btn0. This step is crucial for the board to accept the new bitfile.


Congratulations! 🎉 You've just successfully run UberDDR3 on the Lattice OrangeCrab FPGA board!


Here's a short video demonstrating the project running on my OrangeCrab board:


VII. Conclusion

UberDDR3 has successfully made its way to the Lattice ECP5, another milestone in our open-source journey. From navigating the challenges of porting to a new FPGA vendor to getting it up and running on the OrangeCrab, this blog update highlights the adaptability and potential of UberDDR3.

Exciting things are on the horizon, with future developments including benchmarking, CI integration, and expanded FPGA support. I’d love to hear about your experience running UberDDR3 on the OrangeCrab—drop a comment below!


That wraps up this post. Catch you in the next blog post!





 
 
 

Comentarios


Computer Processor

Subscribe to Our Newsletter

Thanks for submitting!

SUBSCRIBE VIA EMAIL

  • LinkedIn
  • GitHub
  • Youtube

Thanks for submitting!

© 2024 by Angelo Jacobo

bottom of page