top of page
  • Writer's pictureAngelo Jacobo

Getting Started with UberDDR3 (Part 2) - Post #3

In Part 1, we explored the inner workings of the UberDDR3 and how it interfaces with DDR3 RAM through detailed testbench simulations. Now, in Part 2, we're taking the next big step: implementing UberDDR3 on actual hardware. This post serves as both a tutorial and a guide, aimed at helping you integrate UberDDR3 into your own projects


Table of Contents:


I. Understanding Your FPGA's Maximum DDR3 Clock Frequency

DDR3 is all about speed and capacity, and ideally, you want to operate it at its maximum clock frequency to achieve high throughput. However, it's important to recognize that the actual limiting factor might not be the DDR3 module itself, but the FPGA device running it.


To start, you need to determine the speed grade of your FPGA. For instance, if you're using the ArtyS7-50, you can find the necessary details in the ArtyS7 Reference Manual. The part number for the Arty S7-50 is XC7S50-1CSGA324C, indicating a speed grade of -1 and a commercial temperature range. This information is crucial as it directly impacts how fast your DDR3 can operate reliably on this FPGA.



 

Understanding the FPGA Part Number

Decoding the part number of an FPGA can seem like interpreting a mix of random letters and numbers. Shown below is a straightforward way to understand what those characters represent. For detailed guidance, you can refer to the AMD document for 7-series FPGAs:



 

To determine the maximum PHY rate for your FPGA, start by searching for the document titled Spartan-7 DC and AC Switching Characteristics on Google, replacing "Spartan-7" with the specific FPGA family you are using if different. When you find the document, search within it for "DDR3" to find the relevant table as shown below. For the -1C speed grade and a 4:1 memory controller configuration (as is the case with UberDDR3), the maximum PHY rate is 667 Mbps.

Given that DDR3 RAM operates on both edges of the clock—hence the term "double-data rate"—the actual DDR3 clock frequency needed is half the PHY rate. Furthermore, with a 4:1 memory controller, the controller clock, or the clock on the user interface side, is one quarter of the DDR3 clock. This leads to the following calculations:


  • DDR3 clock = 667 Mbps/2 = 333.5 MHz

  • Controller clock = 333.5 MHz/4 = 83.375 MHz


The period for 333.5 MHz is 2.9985 ns and for 83.375 MHz, it is 11.994 ns. For PLL configuration, which requires whole number ratios for its divider and multiplier settings, we will round these to 3 ns for the DDR3 clock (333.333 MHz) and 12 ns for the controller clock (83.333 MHz). This adjustment maintains the 4:1 controller to DDR3 clock ratio, as 12 ns to 3 ns is indeed a factor of 4.


II. Understanding the Characteristics of Your DDR3 RAM

Once you've identified the maximum clock limit of the FPGA, the next step is to understand the characteristics of the DDR3 RAM on your FPGA board. Referring to the Arty S7 Reference Manual, the board uses an MT41K128M16JT-125 memory component. This part number tells us the memory has 128 million locations, each capable of storing 16 bits, which classifies it as x16 DDR3. The speed grade for this component is -125, and multiplying the number of locations by the number of bits (128M x 16) gives us a total memory capacity of 2Gb (2048 Mbits).

Consulting the datasheet for this memory component, we can observe the timing parameters as shown below. Note that the maximum data rate for this -125 speed grade memory is 1600 MT/s (or 1600 Mbps). This rate significantly exceeds the 667 Mbps limit we previously established for the Arty S7 FPGA.


This discrepancy highlights that the FPGA is indeed the bottleneck, restricting the maximum achievable clock speed of the DDR3 memory on this board. It's important to remember that attempting to overclock the FPGA to match the 1600 Mbps capability of the DDR3 could lead to timing failures within the FPGA fabric, causing the design to malfunction despite the DDR3's ability to support such speeds.

Referencing the addressing table below, here are the key details for the 128Megx16 DDR3 memory that we'll need to know later:

  • Row address is 14 bits long (specified as [13:0]).

  • Bank address is 3 bits long (specified as [2:0]).

  • Column address is 10 bits long (specified as [9:0]).


III. Instantiating the UberDDR3 Design

The UberDDR3 has just 3 files: ddr3_top.v, ddr3_controller.v, and ddr3_phy.v. The file ddr3_top.v acts as a wrapper that combines ddr3_controller and ddr3_phy. It includes an instantiation template as shown below:


This makes it easy to integrate UberDDR3 into your design and fully leverage your DDR3 RAM's capabilities. Properly setting the parameters and signal connections is crucial for functionality. We will detail the role of each part in this instantiation, explain their functions, and guide you on setting the correct values or signals for connections.


III.I Set-up the Parameters

First, we need to set the parameters used on the instantiation:


  • CONTROLLER_CLK_PERIOD = clock period in picoseconds of the ddr3 controller. This will also be the clock period of the Wishbone interface. As specified on the previous section (Understanding Your FPGA's Maximum DDR3 Clock Frequency), the controller clock period of the Arty S7 is 12 ns (or 12000 ps).


  • DDR3_CLK_PERIOD = clock period of the DDR3 RAM in picoseconds. This must be a quarter of the CONTROLLER_CLK_PERIOD. As specified on the previous section (Understanding Your FPGA's Maximum DDR3 Clock Frequency), the DDR3 clock period for the Arty S7 FPGA board is 3 ns (or 3000 ps).




  • BA_BITS = width of the bank address. As specified on the previous section (Understanding the Characteristics of Your DDR3 RAM), the memory component on the Arty S7 has a bank width of 3.

  • BYTE_LANES = refers to the number of 8-bit groupings of data. For example, a single x16 DDR3 RAM component will have 2 byte lanes. However, if the memory configuration is a SO-DIMM with 8 DDR3 RAM modules, each being x8 to form a total of 64 bits of data, then BYTE_LANES would be 8. In the case of the Arty S7, which uses a single x16 memory component, there are 2 byte lanes.


  • AUX_WIDTH = this auxiliary line is intended for AXI-interface compatibility but is also utilized in the reset sequence. This must be >= 4. You can just left it on the default value of 4.


  • MICRON_SIM = set to 1 during simulations to shorten power-on sequence. Otherwise, this should be set to 0 for an actual hardware implementation.


  • ODELAY_SUPPORTED = set to 1 if the FPGA bank connected to the DDR3 RAM supports the ODELAY primitive, and set to 0 if it does not. To determine if ODELAY is supported, consult your FPGA's reference manual to see whether the DDR3 is connected to an HP (High Performance) bank, which supports ODELAY, or an HR (High Range) bank, which does not support ODELAY (as noted in UG471, page 134). For example, the reference manual for the Arty S7 specifies that the DDR3 is routed to a 1.35V-powered HR bank, thus this parameter should be set to 0.


  • SECOND_WISHBONE = set to 1 if second wishbone for debugging will be needed, otherwise 0. Normally, we will not be needing this debugging interface so set this to zero.


  • WB2_ADDR_BITS = width of 2nd wishbone address bus for debugging (only relevant if SECOND_WISHBONE = 1). No need to change the default value.


  • WB2_DATA_BITS = width of 2nd wishbone data bus for debugging (only relevant if SECOND_WISHBONE = 1). No need to change the default value.


Thus, the parameters for the UberDDR3 instantiation on Arty S7 should look like this:



III.II Generate the Clocks and Reset

Next, we need to generate the clocks required by UberDDR3. As stated on the instantiation template, 4 clocks will be needed:

  • i_controller_clk = 83.333 MHz (12 ns period)

  • i_ddr3_clk = 333.333 MHz (3ns period)

  • i_ddr3_clk_90 = 90° phase shifted version of i_ddr3_clk

  • i_ref_clk = 200 MHz


The controller and ddr3 clock frequencies stated above are what we concluded on the previous section (Understanding Your FPGA's Maximum DDR3 Clock Frequency), adjust the clock frequencies accordingly if you use a different FPGA. But the i_ref_clk , which is the reference clock for IDELAYCTRL, has a fixed clock frequency of 200 MHz. Also, the i_ddr3_clk_90 will only be required when ODELAY_SUPPORTED is zero.


For clock generation, we can either create our own clock generator via the PLLE2_ADV, but we can also use the clocking wizard IP of Vivado. Click IP Catalog then search for Clocking Wizard. On the tab of Output Clocks, specify the four clocks:

Instantiate the clock generator module:

These clock signals will then be connected to the UberDDR3:


The i_rst_n input of the UberDDR3 is an active-low reset. This will be connected to: !i_rst && clk_locked. This means the UberDDR3 reset will only be active (low) when i_rst (active high reset from outside the design) becomes high or the clk_locked is low. The clk_locked being low means the internal oscillator of the clock generator has not yet synchronized with the phase and frequency of the input clock signal.


III.III Wishbone Interface Setup

After configuring the clocks and reset, the next step involves setting up the Wishbone interface. This post won't go into detail about the Wishbone interface as it's a standard interface thoroughly explained in the B4 version Wishbone Standard specification. It’s important to note that UberDDR3 employs a pipelined Wishbone interface rather than the standard Wishbone to maximize throughput. The instantiation template provided includes comments explaining the function of each signal, serving as a guide:


The instantiation is merely a suggestion and can be modified by the user. For instance, in the provided template, the Wishbone cycle is set to 1, assuming a system with one master and one slave where the slave can control the bus continuously (thus holding the Wishbone cycle high). The Wishbone selector is also preset to all 1s, meaning all byte lanes will be written to, simplifying the setup.


Below are the two auxiliary ports associated with the main wishbone. This is not part of Wishbone Interface, but is intended for AXI-interface compatibility which is not yet available (but soon!):

Ports

Function

i_aux

Request ID line with width of AUX_WIDTH. The Request ID is retrieved simultaneously with the strobe request.

o_aux

Request ID line with width of AUX_WIDTH. The Request ID is sent back simultaneously with the acknowledgement signal.


Additional details worth noting:

  • i_wb_addr is not byte-addresable but burst-addressable, meaning a single address will point to an 8-burst word. For example on Arty S7 with x16 memory, a read request to address x0000 will return data with width of 8 times the data width of 16, thus 128 bits. If this is a read sequential access, then the address after x0000 will be x0001 which will return another 128-bits of write data.


  • i_wb_addr will have a width of ROW_BITS + COL_BITS + BA_BITS - 3. So for the Arty S7, the width of i_wb_addr will be: 14 + 10 + 3 - 3 = 24 bits.


  • i_wb_data and o_wb_data will have a width of BYTE_LANES*64. Thus for the Arty S7, the width of i_wb_data wil be: 2*64 or 128 bits. The factor 64 comes from each BYTE_LANE containing 8 bits, and each word packs 8 bursts, calculated as LANES * 8 * 8 or LANES * 64.


III.IV Second Wishbone Interface

UberDDR3 has a debugging interface which is the second wishbone interface. This can be enabled by setting the SECOND_WISHBONE parameter to 1. But in most cases, the user will not need to tap into this debugging interface. In this case, the user can just leave as is the default values on the instantiation template:


III.V DDR3 Interface

Following the second Wishbone ports, the DDR3 ports are next. These signals are directly connected to the top-level of the design and must be included in the pin constraints. Simply connect the pins as indicated in the instantiation template:


Most of these signals are single-bit signals, but multi-bit signals are annotated with comments to guide you on the correct width. These ports should be declared in the top-level module. For example, the port declarations for the top-level module of the Arty S7 will look like this:


III.VI Debugging Ports

These ports are designed for directly monitoring internal signals. The signals connected to the o_debug1 to o_debug3 ports can be found in the ddr3_controller.v file. The ports o_ddr3_debug_read_dqs_p and o_ddr3_debug_read_dqs_n are detailed in the ddr3_phy.v file. However, in most scenarios, there's no need to debug or tap these ports, so you can leave the connections empty:

And there you go. we are now done on instantiating the UberDDR3!


IV. Configuring Defines for Speed Bin and Capacity

Once you've set up the module instantiation, the next step is to adjust the defines in the ddr3_controller module. You only need to modify two defines. At the top of the ddr3_controller.v file:

  1. Uncomment the define for the DDR3 RAM's speed bin. Based on what we determined in the section (Understanding the Characteristics of Your DDR3 RAM), the DDR3 RAM in the Arty S7 has a speed bin of 1600. Therefore, you should uncomment `define DDR3_1600_11_11_11.

  2. Uncomment the define for the DDR3 capacity. As mentioned on the same section, the DDR3 RAM is 2Gb, so `define RAM_2Gb should be uncommented.


With these settings configured, are we ready to implement this on the hardware? Yes! In the next section, we'll walk through a simple example design utilizing the UberDDR3.


V. Example Design Demonstration

The UberDDR3 already includes a sample design for Arty S7. This section will elaborate on how to implement this example design on hardware. Vivado 2022.1 is used on this demonstration.


V.I Creating the Vivado Project

1. Retrieve the UberDDR3 repository by git cloning (take note of including submodules):


2. Create a new Vivado project (as shown below, I named it uberddr3_test). Then on Add Sources page click Add Directories, then choose the folders rtl/ and arty_s7/ (under example_demo folder).


3. On the Add Constraints page, choose the file Arty-S7-50-Master.xdc under the example_demo/arty_s7/ folder since that will be the target FPGA board on this example demonstration.

4. Then on the Choose Board page, choose ArtyS7-50 since that will be the target FPGA board on this example demonstration.


V.II Implementing on Hardware

1. Under the hierarchy of Design Sources, choose the file arty_ddr3 as the top-level design module.


2. Click Generate Bitstream to run synthesis-to-bitstream generation then you're DONE! While waiting for the synthesis and bitstream generation to complete, let's walk through what this example design accomplishes.


V.III Example Design Walkthrough

Looking on the example design arty_ddr3.v, the top-level port connections include the 100MHz clock (i_clk), active-high reset (i_rst), ddr3 interface, UART, and LEDs:

Here's a breakdown of each component's role within the design:


  • LEDs

As mentioned on the previous section (Debugging Ports), o_debug1 is used to tap the signals inside the controller. On this case, the o_debug1 is connected to the state register for calibration. State number "23" is the DONE_CALIBRATE. So basically, the 4 LEDs will lit up only when the calibration is successful.


  • UART

The UART line in this design functions by transmitting the data from o_wb_data serially through the TX line once the o_wb_ack signal goes high—these Wishbone signals originate from the DDR3 controller interface. The rd_data is the serial data retrieved from RX line, the m_axis_tvalid determines if this rd_data is valid. The baud rate is set to 9600, calculated with a prescaler of 1085, derived from the formula: CLK_FREQ/(BAUD_RATE*8) = 83.333MHz/(9600*8) = 1085.


  • Main Logic

The core logic of this design involves handling the ASCII input through the UART. When the UART receives lowercase letters (ASCII decimal values from 97 to 122 corresponding to 'a' to 'z'), these characters are written to the DDR3 memory. If it receives uppercase letters (ASCII decimal values from 65 to 90 corresponding to 'A' to 'Z'), it reads back the previously stored lowercase letters from the corresponding DDR3 addresses..

 

For example, sending "abcdefg" via the UART terminal stores these letters in DDR3, and subsequently sending "ABCDEFG" retrieves the small letter equivalent: "abcdefg". Very simple!


The remaining components are the instantiation of the clock generator and UberDDR3 which were already discussed on the previous section (Instantiating the UberDDR3 Design and Generate the Clocks and Reset).


V.IV Example Design Pin Constraint

The pin constraint of the example design include constraints for the clock, LEDs, reset, and the UART line:


For the DDR3, the pin constraints can be determined from the schematic of your FPGA board. For example, the schematic for the Arty S7, specifically on page 5, details the DDR3 connections:


In the pin constraint file, you will need to set the slew rate to FAST. Input termination is set to UNTUNED_SPLIT_50 to match the impedance of the signal line, which helps in reducing reflections and improving signal integrity. The I/O standard used is SSTL135, which is a Stub Series-Terminated Logic with a nominal voltage of 1.35V. You may wonder if you need to manually create this constraint file? Well for now yes you have to write this constraint file yourself.


Once you have successfully created a constraint file for your FPGA board and confirmed that UberDDR3 works correctly on your setup, I encourage you to make a pull request to the UberDDR3 project. This allows us to include your constraint file as part of the master constraint files for others to use.



















V.V Run the Example Design

Once the synthesis-to-bitstream generation is done, click on Open Hardware Manager then Auto Connect. Right click on your FPGA, on Arty S7 its xc7s50. A confirmation will show up, click on Program. DONE!


Below is a video demonstration on my Arty S7. Enjoy!




VI. Conclusion

We've come a long way from Part 1, where we delved into the inner workings of the UberDDR3 and its interaction with DDR3 RAM through simulations. Now, in Part 2, we've moved onto the practical implementation of the UberDDR3 on actual hardware. This post has provided a comprehensive guide to help you integrate the UberDDR3 into your projects, covering everything from understanding your FPGA's capabilities, setting up the design files, module instantiation, and then end with the example design which you can use as a reference.


That wraps up this post. Catch you in the next blog post!


269 views

Comments


Computer Processor

Subscribe to Our Newsletter

Thanks for submitting!

bottom of page