Languages/VHDL/Examples/SynchronousBusXilinx

From UIT
Revision as of 11:58, 7 February 2013 by Pim (Talk | contribs)
Jump to: navigation, search

Contents

Synchronous parallel bus input and output for Xilinx Spartan6 FPGA

Here is an example on how to enter an FPGA using a parallel synchronous bus, and how to get a similar bus out of the FPGA.

This kind of connection is referred as source synchronous input or output in the Xilinx documentation.

This example is based on a real industrial design [fixme:reference+link], where we have a 20 bit synchronous bus flowing through an (well filled) FPGA at 150 MHz.

Of course, doing some computing on the data is possible, but this example will emphasis connections outside the FPGA.

               _________
              |         |
clock_input ->|         |-> clock_output
              |  FPGA   |
 data_input =>|         |=> data_output
              |_________|

Here is what our input and output signals looks like:

              __________________________                                  __________________________
             /                          \                                /                          \
clock ...___/                            \______________________________/                            \_______________...
 
      ..._______________________________  ___________________________________________________________  ____________...
                                        \/                                                           \/
                  value(t-1)            |                     value(t)                               |
data  ..._______________________________/\___________________________________________________________/\____________...

We can see the data is changing around the falling edge, and it is really stable on the rising edge.

VHDL entity

Here is a trivial part.

library ieee;
	use ieee.std_logic_1164.all;
	use ieee.numeric_std.all;
 
entity pass_through is
	port
	(
		reset		: in	std_logic;
 
		clock_input	: in	std_logic;
		data_input	: in	std_logic_vector(19 downto 0);
 
		clock_output	: out	std_logic;
		data_output 	: out	std_logic_vector(19 downto 0)
	);
end pass_through;


Synchronous input

Here are the input timings, they usually can be found in the datasheet.

                                        Tcp
            |<--------------------------------------------------------->|
            |            Tch                          Tcl               |
            |<-------------------------->|<---------------------------->|
            |                            |                              |
            | __________________________ |                              | __________________________
            |/                          \|                              |/                          \
clock ...___/                            \______________________________/                            \_______________...
                                         |                              |
                                         |                              |
                                         |           OFFSET IN          |
                                         |   |<------------------------>|                                            
                                         |   |                    VALID                                            
                                         |   |<-------------------------------------------------->|                                            
                                         |   |                                                    |                                            
                                        >|---|<---- Tdu                                           |                                            
                                             |                                                    |                                            
      ..._____________________________       |____________________________________________________|      ____________...                                            
                                      \XXXXX/                                                     \XXXXX/                                            
                  value(t-1)          |XXXXX|                     value(t)                        |XXXXX|                                            
data  ..._____________________________/XXXXX\_____________________________________________________/XXXXX\____________...                                            
 
 
                                                Min       Typ      Max      
Clock period                            Tcp     6.6        -        -      ns               
Clock high duration                     Tch     3.25       -        3.40   ns
Clock low duration                      Tcl     3.25       -        3.40   ns
Data valid after clock falling edge     Tdu    -1.0        -        1.0    ns

Principles

  • You MUST use a GCLK pin for clock input, on the same bank as the input data.
  • If possible, try to put all your bus and clock input on the same half-bank <ref name="ug625">Template:Cite web</ref>.
  • Use the D flip-flop integrated into the input pad
    • This can be achieved by using the "iob" attribute in the VHDL code or by placer options.
  • There is a dedicated clock routing structure named BUFIO2 for sampling input pins.
    • The IOCLK output of the BUFIO2 MUST be used to drive the input pad flip-flop clock.
    • The DIVCLK output of the BUFIO2 MUST be connected to a global clock buffer (BUFG) and is used to drive internal logic.

Timing constraints - UCF

net "clock_input"		loc = "y11"	| iostandard = LVCMOS33 | tnm_net = "clock_input";
net "data_input<0>"		loc = "w10"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<1>"		loc = "t7"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<2>"		loc = "w8"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<3>"		loc = "v9"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<4>"		loc = "ab7"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<5>"		loc = "ab4"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<6>"		loc = "aa4"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<7>"		loc = "ab5"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<8>"		loc = "y6"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<9>"		loc = "aa8"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<10>"		loc = "v7"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<11>"		loc = "w6"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<12>"		loc = "r7"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<13>"		loc = "u6"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<14>"		loc = "y3"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<15>"		loc = "v5"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<16>"		loc = "y5"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<17>"		loc = "ab3"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<18>"		loc = "u8"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<19>"		loc = "u9"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
 
TIMESPEC TS_clock_input = period "clock_input" 153.85 MHz high 50%;
TIMEGRP "data_bus_in" OFFSET = IN 2.25 ns VALID 4.5 ns BEFORE clock_input RISING;
  • Data bus is grouped by using tnm
  • The clock_input is given by using 1/(Tch_min+Tcl_min). This can't happen in practice, but here we introduce some margin.
  • The OFFSET = IN constraint is Tcl_min-Tdu_max and corresponds to the setup time is most datasheets.
  • The VALID constraint is Tcp_min-(-Tdu_min+Tdmu_max) and corresponds to the setup time + hold time in most datasheets.

Inside the FPGA

  • Use pipelining
    • Synthesis of fast signals through a FPGA can be difficult (the more the FPGA is full, the more it is difficult), and we're not interested in the delay introduced by our pass through.
  • The synthesizer should not try to replace the pipeline by a shift-register.
 * This can be forced by the VHDL shreg_extract attribute.
  • If the timing between the internal stages and the output is not met, you can try to add stages to the pipelines.
  • Use the DIVCLK clock provided by the BUFIO2.

Synchronous output

Principles

  • Use a ODDR2 for clock output
    • The ODDR2 uses a clock and it's counter clock, so if the clock is really fast or has not a 50% duty cycle, use a PLL (DCM) for generating clock and not_clock.
    • We want to change the data on the clock falling edge, so they will be stable on the rising edge, so we invert the clock.
  • Use the D flip-flop integrated into the output PAD

Inverting clock output

Here is how to output a clock from a spartan 6 device. This code has absolutely no hardware cost, since every FPGA output pad has a ODDR2 block inside it.

library ieee;
	use ieee.std_logic_1164.all;
	use ieee.numeric_std.all;
 
library unisim;
	use unisim.vcomponents.all;
 
entity clock_output_inverting_xilinx is
	port
	(
		clock_in			: in	std_logic;
		clock_out			: out	std_logic
	);
end clock_output_inverting_xilinx;
 
architecture rtl of clock_output_inverting_xilinx is
	signal clock_in_not	: std_logic;
begin
 
    -------------------------------------------------------------------------------
    -- Clock forwarding
    --
    -- The oddr2 is used to output a clock, so we can add constraint between this
    -- output and the others audio signals
    -------------------------------------------------------------------------------
    clock_in_not <= not clock_in;
 
    i_oddr : oddr2
    generic map
    (
        ddr_alignment => "c1",    -- sets output alignment to "none", "c0", "c1"
        init          => '0',     -- sets initial state of the q output to '0' or '1'
        srtype        => "async"  -- specifies "sync" or "async" set/reset
    )
    port map
    (
        q  => clock_out,
        c0 => clock_in,
        c1 => clock_in_not,
        ce => '1',
        d0 => '1',
        d1 => '0',
        r  => '0',
        s  => '0'
    );
 
end architecture rtl;

Timing constraints - UCF

Here is the joke, if the data output flip-flops are placed into the PADs and the clock comes from a global clock buffer, there is nothing more to do. Here is the timing constraint that we can use for verifying the skew in the "Timing report"

net "clock_output"		loc = "b8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<0>"		loc = "d6"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<1>"		loc = "a4"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<2>"		loc = "f14"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<3>"		loc = "a15"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<4>"		loc = "d7"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<5>"		loc = "a5"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<6>"		loc = "b6"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<7>"		loc = "d8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<8>"		loc = "f8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<9>"		loc = "a6"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<10>"		loc = "c6"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<11>"		loc = "d9"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<12>"		loc = "g8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<13>"		loc = "a7"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<14>"		loc = "c7"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<15>"		loc = "e8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<16>"		loc = "g9"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<17>"		loc = "a9"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<18>"		loc = "c8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<19>"		loc = "f9"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
 
 
TIMEGRP "data_bus_out" OFFSET = OUT AFTER "clock_input" REFERENCE_PIN "clock_output";

Timing report example

Possible problems

Timing is not met

[fixme : explain how to find which flip-flops should be placed manually]

Over-constrained design

  • possible causes
    • Your input signals are split in two half bank and can't be sampled using the same IOBUF2

References

Template:Reflist template or <references />

Personal tools
Namespaces
Variants
Actions
Navigation
Browse
Toolbox