Languages/VHDL/Examples/SynchronousBusXilinx

From UIT
Jump to: navigation, search

Contents

Synchronous parallel bus input and output for Xilinx Spartan6 FPGA

Here is an example on how to enter an FPGA using a parallel synchronous bus, and how to get a similar bus out of the FPGA.

This kind of connection is referred as source synchronous input or output in the Xilinx documentation.

This example is based on a real industrial design [fixme:reference+link], where we have a 20 bit synchronous bus flowing through an (well filled) FPGA at 150 MHz.

Of course, doing some computing on the data is possible, but this example will emphasis connections outside the FPGA.

               _________
              |         |
clock_input ->|         |-> clock_output
              |  FPGA   |
 data_input =>|         |=> data_output
              |_________|

Here is what our input and output signals looks like:

              __________________________                                  __________________________
             /                          \                                /                          \
clock ...___/                            \______________________________/                            \_______________...
 
      ..._______________________________  ___________________________________________________________  ____________...
                                        \/                                                           \/
                  value(t-1)            |                     value(t)                               |
data  ..._______________________________/\___________________________________________________________/\____________...

We can see the data is changing around the falling edge, and it is really stable on the rising edge.

VHDL entity

Here is a trivial part.

library ieee;
	use ieee.std_logic_1164.all;
	use ieee.numeric_std.all;
 
entity pass_through is
	port
	(
		reset		: in	std_logic;
 
		clock_input	: in	std_logic;
		data_input	: in	std_logic_vector(19 downto 0);
 
		clock_output	: out	std_logic;
		data_output 	: out	std_logic_vector(19 downto 0)
	);
end pass_through;


Synchronous input

Here are the input timings, they usually can be found in the datasheet.

                                        Tcp
            |<--------------------------------------------------------->|
            |            Tch                          Tcl               |
            |<-------------------------->|<---------------------------->|
            |                            |                              |
            | __________________________ |                              | __________________________
            |/                          \|                              |/                          \
clock ...___/                            \______________________________/                            \_______________...
                                         |                              |
                                         |                              |
                                         |           OFFSET IN          |
                                         |   |<------------------------>|                                            
                                         |   |                    VALID                                            
                                         |   |<-------------------------------------------------->|                                            
                                         |   |                                                    |                                            
                                        >|---|<---- Tdu                                           |                                            
                                             |                                                    |                                            
      ..._____________________________       |____________________________________________________|      ____________...                                            
                                      \XXXXX/                                                     \XXXXX/                                            
                  value(t-1)          |XXXXX|                     value(t)                        |XXXXX|                                            
data  ..._____________________________/XXXXX\_____________________________________________________/XXXXX\____________...                                            
 
 
                                                Min       Typ      Max      
Clock period                            Tcp     6.6        -        -      ns               
Clock high duration                     Tch     3.25       -        3.40   ns
Clock low duration                      Tcl     3.25       -        3.40   ns
Data valid after clock falling edge     Tdu    -1.0        -        1.0    ns

Principles

  • You MUST use a GCLK pin for clock input, on the same bank as the input data.
  • If possible, try to put all your bus and clock input on the same half-bank <ref name="ug625">Template:Cite web</ref>.
  • Use the D flip-flop integrated into the input pad
    • This can be achieved by using the "iob" attribute in the VHDL code or by placer options.
  • There is a dedicated clock routing structure named BUFIO2 for sampling input pins.
    • The IOCLK output of the BUFIO2 MUST be used to drive the input pad flip-flop clock.
    • The DIVCLK output of the BUFIO2 MUST be connected to a global clock buffer (BUFG) and is used to drive internal logic.

Timing constraints - UCF

net "clock_input"		loc = "y11"	| iostandard = LVCMOS33 | tnm_net = "clock_input";
net "data_input<0>"		loc = "w10"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<1>"		loc = "t7"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<2>"		loc = "w8"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<3>"		loc = "v9"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<4>"		loc = "ab7"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<5>"		loc = "ab4"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<6>"		loc = "aa4"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<7>"		loc = "ab5"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<8>"		loc = "y6"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<9>"		loc = "aa8"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<10>"		loc = "v7"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<11>"		loc = "w6"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<12>"		loc = "r7"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<13>"		loc = "u6"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<14>"		loc = "y3"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<15>"		loc = "v5"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<16>"		loc = "y5"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<17>"		loc = "ab3"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<18>"		loc = "u8"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
net "data_input<19>"		loc = "u9"	| iostandard = LVCMOS33 | tnm = "data_bus_in";
 
TIMESPEC TS_clock_input = period "clock_input" 153.85 MHz high 50%;
TIMEGRP "data_bus_in" OFFSET = IN 2.25 ns VALID 4.5 ns BEFORE clock_input RISING;
  • Data bus is grouped by using tnm
  • The clock_input is given by using 1/(Tch_min+Tcl_min). This can't happen in practice, but here we introduce some margin.
  • The OFFSET = IN constraint is Tcl_min-Tdu_max and corresponds to the setup time is most datasheets.
  • The VALID constraint is Tcp_min-(-Tdu_min+Tdmu_max) and corresponds to the setup time + hold time in most datasheets.

Inside the FPGA

  • Use pipelining
    • Synthesis of fast signals through a FPGA can be difficult (the more the FPGA is full, the more it is difficult), and we're not interested in the delay introduced by our pass through.
  • The synthesizer should not try to replace the pipeline by a shift-register.
 * This can be forced by the VHDL shreg_extract attribute.
  • If the timing between the internal stages and the output is not met, you can try to add stages to the pipelines.
  • Use the DIVCLK clock provided by the BUFIO2.

Synchronous output

Principles

  • Use a ODDR2 for clock output
    • The ODDR2 uses a clock and it's counter clock, so if the clock is really fast or has not a 50% duty cycle, use a PLL (DCM) for generating clock and not_clock.
    • We want to change the data on the clock falling edge, so they will be stable on the rising edge, so we invert the clock.
  • Use the D flip-flop integrated into the output PAD

Inverting clock output

Here is how to output a clock from a spartan 6 device. This code has absolutely no hardware cost, since every FPGA output pad has a ODDR2 block inside it.

library ieee;
	use ieee.std_logic_1164.all;
	use ieee.numeric_std.all;
 
library unisim;
	use unisim.vcomponents.all;
 
entity clock_output_inverting_xilinx is
	port
	(
		clock_in			: in	std_logic;
		clock_out			: out	std_logic
	);
end clock_output_inverting_xilinx;
 
architecture rtl of clock_output_inverting_xilinx is
	signal clock_in_not	: std_logic;
begin
 
    -------------------------------------------------------------------------------
    -- Clock forwarding
    --
    -- The oddr2 is used to output a clock, so we can add constraint between this
    -- output and the others audio signals
    -------------------------------------------------------------------------------
    clock_in_not <= not clock_in;
 
    i_oddr : oddr2
    generic map
    (
        ddr_alignment => "c1",    -- sets output alignment to "none", "c0", "c1"
        init          => '0',     -- sets initial state of the q output to '0' or '1'
        srtype        => "async"  -- specifies "sync" or "async" set/reset
    )
    port map
    (
        q  => clock_out,
        c0 => clock_in,
        c1 => clock_in_not,
        ce => '1',
        d0 => '1',
        d1 => '0',
        r  => '0',
        s  => '0'
    );
 
end architecture rtl;

Timing constraints - UCF

Here is the joke, if the data output flip-flops are placed into the PADs and the clock comes from a global clock buffer, there is nothing more to do. Here is the timing constraint that we can use for verifying the skew in the "Timing report"

net "clock_output"		loc = "b8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<0>"		loc = "d6"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<1>"		loc = "a4"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<2>"		loc = "f14"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<3>"		loc = "a15"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<4>"		loc = "d7"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<5>"		loc = "a5"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<6>"		loc = "b6"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<7>"		loc = "d8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<8>"		loc = "f8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<9>"		loc = "a6"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<10>"		loc = "c6"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<11>"		loc = "d9"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<12>"		loc = "g8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<13>"		loc = "a7"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<14>"		loc = "c7"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<15>"		loc = "e8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<16>"		loc = "g9"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<17>"		loc = "a9"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<18>"		loc = "c8"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
net "data_output<19>"		loc = "f9"	| iostandard = LVCMOS18 | slew = fast | drive = 12 | tnm = "data_bus_out";
 
 
TIMEGRP "data_bus_out" OFFSET = OUT AFTER "clock_input" REFERENCE_PIN "data_output<0>";

Timing report example

Here is an extract of the Timing report about this part:

 TIMEGRP "data_bus_out" OFFSET = OUT AFTER COMP "clock_input" REFERENCE_PIN BEL         "data_output<0>"; 
 Bus Skew: 0.193 ns;  
 -----------------------------------------------+-------------+------------+-------------+------------+--------------+ 
                                                |Max (slowest)|  Process   |Min (fastest)|  Process   |              | 
 PAD                                            | Delay (ns)  |   Corner   | Delay (ns)  |   Corner   |Edge Skew (ns)| 
 -----------------------------------------------+-------------+------------+-------------+------------+--------------+ 
 clock_output                                   |       11.141|      SLOW  |        7.379|      FAST  |         3.187| 
 data_output<0>                                 |        7.954|      SLOW  |        3.983|      FAST  |         0.000| 
 data_output<1>                                 |        7.953|      SLOW  |        3.982|      FAST  |        -0.001| 
 data_output<2>                                 |        7.811|      SLOW  |        3.840|      FAST  |        -0.143| 
 data_output<3>                                 |        7.761|      SLOW  |        3.790|      FAST  |        -0.193| 
 data_output<4>                                 |        7.945|      SLOW  |        3.974|      FAST  |        -0.009| 
 data_output<5>                                 |        8.003|      SLOW  |        4.032|      FAST  |         0.049| 
 data_output<6>                                 |        8.004|      SLOW  |        4.033|      FAST  |         0.050| 
 data_output<7>                                 |        7.945|      SLOW  |        3.974|      FAST  |        -0.009| 
 data_output<8>                                 |        8.000|      SLOW  |        4.029|      FAST  |         0.046| 
 data_output<9>                                 |        8.004|      SLOW  |        4.033|      FAST  |         0.050| 
 data_output<10>                                |        7.954|      SLOW  |        3.983|      FAST  |         0.000| 
 data_output<11>                                |        7.952|      SLOW  |        3.981|      FAST  |        -0.002| 
 data_output<12>                                |        7.950|      SLOW  |        3.979|      FAST  |        -0.004| 
 data_output<13>                                |        7.952|      SLOW  |        3.981|      FAST  |        -0.002| 
 data_output<14>                                |        7.952|      SLOW  |        3.981|      FAST  |        -0.002| 
 data_output<15>                                |        8.000|      SLOW  |        4.029|      FAST  |         0.046| 
 data_output<16>                                |        8.000|      SLOW  |        4.029|      FAST  |         0.046| 
 data_output<17>                                |        8.002|      SLOW  |        4.031|      FAST  |         0.048| 
 data_output<18>                                |        7.952|      SLOW  |        3.981|      FAST  |        -0.002| 
 data_output<19>                                |        7.950|      SLOW  |        3.979|      FAST  |        -0.004| 
 -----------------------------------------------+-------------+------------+-------------+------------+--------------+

What we can see is the data_output<0> changes 3.187 ns before the clock rising edge (first line of the table), and the maximum skew around this is 0.193 ns.

Possible problems

Timing is not met

[fixme : explain how to find which flip-flops should be placed manually]

Over-constrained design

  • possible causes
    • Your input signals are split in two half bank and can't be sampled using the same IOBUF2

References

Template:Reflist template or <references />

Appendix - VHDL code

library ieee;
	use ieee.std_logic_1164.all;
	use ieee.numeric_std.all;
 
library unisim;
	use unisim.vcomponents.all;
 
architecture xilinx_s6 of pass_through is
 
	signal internal_clock						: std_logic;
	signal bufio2_sample_clock					: std_logic;
	signal bufio2_to_bufg_clock					: std_logic;
 
	signal data_in								: std_logic_vector(data_input'range);
 
	type data_internal_t is array(g_pipe_stages downto 0) of std_logic_vector(data_input'range);
	signal data_internal						: data_internal_t;
 
	signal data_out								: std_logic_vector(data_input'range);
 
	-- Force data sampling into the IO pads
	attribute iob								: string;
	attribute iob of data_in					: signal is "FORCE";
	attribute iob of data_out					: signal is "FORCE";
 
	-- The synthethiser should really not convert our pipeline into a shift register
	attribute shreg_extract : string;
	attribute shreg_extract of data_in			: signal is "false";
	attribute shreg_extract of data_internal	: signal is "false";
	attribute shreg_extract of data_out			: signal is "false";
 
begin
	----------------------------------------------------------------------------
	-- Clock mgmt for using the BUFIO2
	----------------------------------------------------------------------------
	i_bufio2 : BUFIO2
	generic map
	(
		DIVIDE			=> 1,		-- Do not divide the clock
		DIVIDE_BYPASS	=> TRUE,	-- Bypass the clock divider
		I_INVERT		=> FALSE,	-- Do not invert clock
		USE_DOUBLER		=> FALSE	-- Do not double the clock
	)
	port map
	(
		DIVCLK			=> bufio2_to_bufg_clock,-- This clock must be transmitted to a BUFG
		IOCLK			=> bufio2_sample_clock,	-- This clock must be used to sample datas on the IOBUF2
		SERDESSTROBE	=> open,				-- Unused
		I				=> clock_input			-- This is the external clock for inputs
	);
 
    i_bufg : BUFG
	port map
	(
		O => internal_clock,
		I => bufio2_to_bufg_clock
	);
 
	----------------------------------------------------------------------------
	-- Read the inputs
	-- They all are on the same IOBUF2 region
	----------------------------------------------------------------------------
	process(reset, bufio2_sample_clock)
	begin
		if reset = '1' then
			data_in <= (others => '0');
		elsif rising_edge(bufio2_sample_clock) then
			data_in	<= data_input;
		end if;
	end process;
 
	----------------------------------------------------------------------------
	-- Internal data (let some time for the routing delay)
	----------------------------------------------------------------------------
	process(reset, internal_clock)
	begin
		if reset = '1' then
			for i in data_internal'range loop
				data_internal(i) <= (others => '0');
			end loop;
		elsif rising_edge(internal_clock) then
			data_internal(data_internal'left) <= data_in;
			for i in data_internal'left - 1 downto data_internal'right loop
				data_internal(i) <= data_internal(i+1);
			end loop;
		end if;
	end process;
 
	----------------------------------------------------------------------------
	-- Spy data, from the middle of the pipe
	----------------------------------------------------------------------------
	-- spy_data <= data_internal(g_pipe_stages/2);
	-- spy_clock <= internal_clock;
 
	----------------------------------------------------------------------------
	-- Set the outputs
	----------------------------------------------------------------------------
	process(reset, internal_clock)
	begin
		if reset = '1' then
			data_out <= (others => '0');
		elsif rising_edge(internal_clock) then
			data_out <= data_internal(data_internal'right);
		end if;
	end process;
 
	data_output <= data_out;
 
	i_clock_out : entity work.clock_output_inverting_xilinx
	port map
	(
		clock_in		=> internal_clock,
		clock_out		=> clock_output
	);
end xilinx_s6;

Appendix - Complete project for ISE 14.2

Media:pt_demo-v1.0.zip

Personal tools
Namespaces
Variants
Actions
Navigation
Browse
Toolbox