Hi everyone,
I've been debugging my master unit for the last few days and I'm slowly losing my sanity. Anyway I am currently stuck on a weird bug where my master unit always works for a buffer of length 1, works exactly twice then gets stuck for a length of 2 or 3, works exactly once for a length of 4, and is always stuck for a bigger length >= 5. By stuck, I mean that the interrupt is never raised thus my program waits indefinitely. I would appreciate it if someone could have a fresh look on my design or have any tips, which may help me pinpoint the root cause.
Here's a detailed explanation of my design: my master unit is an Avalon slave / master module which performs the byte swap and bits mirror operation in place on a contiguous array in memory. It must be first setup by the Nios II CPU through the following slave register map:
- begin_address : register to save the buffer start address inclusive
- end_address : register to save the buffer end address inclusive
- start : a write to this register starts the operation and the device becomes busy until finished. An IRQ is then raised.
- irq_clear : a write to this register clears any pending IRQ
Thus, the state machine of my master read process is as follows: in short, it fetches bursts of size 16 (the last one may be smaller) if the FIFO (of size 32) has at least 16 free entries until the read pointer reaches the end address. It supports wait requests and readdatavalid = 0.
Similarly, the master write process waits for the FIFO to have at least 16 elements if there are more than 16 words left to send them as a burst (the last burst may be smaller). Since the FIFO read request must always be asserted one cycle earlier, there's a write_next_output register that delays the output of the write FIFO to the actual Avalon writedata bus (so that a waitrequest can stall an earlier read).
I have simulated the device successfully with bursts of 2, 5 and 33 words and they all work in ModelSim (the interrupt is raised at the end of the operation). I also know that the C software and the interruption mechanism is correct because I can always make it work for a buffer of single value. Thus, the error may only lay in the read or write process (i.e. real Avalon timings differ from the simulation) or the FIFO IPs do not behave as observed in the simulation.
Although it is not very legible, you can see in the next screenshots that the IRQ is raised in all cases in ModelSim (I can also provide the .wlf files if needed). In all simulations, I have tried to insert / ignore various wait requests and data valid assertions:
However, when I perform the test with the real hardware, the device will predictably succeed an exact number of times and fail afterwards, depending on the buffer size.
There's one thing that I think of, which I wonder if it could cause some weird behavior: I have enabled dual-ports on my on-chip memory, and each master unit is connected to a distinct respective port. Here's my Qsys:
What puzzles me is that the error rate scales with the burst size. Why does a burst of size 1 always work, a burst of size 2 or 3 works exactly twice then fails, and a burst of size 4 works exactly once and then fails; why does a burst of size 5 or bigger never terminates? I'm running out of ideas to debug this.
Any help would be greatly appreciated.
Thank you in advance and best regards,
Alex
Below is the VHDL implementation:
--
-- MIRROR AND SWAP AVALON SLAVE/MASTER
-- RTES Lab 2
--
-- file: master_mirror_swap.vhd
-- author: Alexandre CHAU
-- date: 18/04/2020
-- version: 0.1
--
-- This component is an Avalon slave/master module that performs the bits
-- mirror and swap operation as defined in {@link mirror_swap_op.vhd} on
-- an arbitrary contiguous array in memory. The operation transforms the
-- values in place and triggers an interrupt request when finished.
--
-- The addressable register map for the slave interface is as follows:
-- +---------------+-------------------+------------------------------------------------------+
-- | Register name | Offset (in bytes) | Description |
-- +---------------+-------------------+------------------------------------------------------+
-- | begin_address | 0 | A write to this register when the device is not busy |
-- | | | sets the value as the inclusive start address of the |
-- | | | array to process. The value in this register MUST be |
-- | | | smaller or equal to the one set in end_address. |
-- +---------------+-------------------+------------------------------------------------------+
-- | end_address | 4 | A write to this register when the device is not busy |
-- | | | sets the value as the inclusive end address of the |
-- | | | array to process. The value in this register MUST be |
-- | | | bigger or equal to the one set in start_address. |
-- +---------------+-------------------+------------------------------------------------------+
-- | start | 8 | A write of any value to this register when the |
-- | | | device is not busy will start the processing of the |
-- | | | array. This register MUST only be written once |
-- | | | begin_address and end_address have been set. The |
-- | | | module then becomes busy until an IRQ is raised. |
-- +---------------+-------------------+------------------------------------------------------+
-- | irq_clear | 12 | A write of any value to this register when the |
-- | | | device is not busy will clear a pending interrupt |
-- | | | request. |
-- +---------------+-------------------+------------------------------------------------------+
--
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity master_mirror_swap is
port (
clk : in std_logic;
reset_n : in std_logic;
-- Avalon slave
as_address : in std_logic_vector(1 downto 0);
as_write : in std_logic;
as_writedata : in std_logic_vector(31 downto 0);
-- Avalon master read
am_r_address : out std_logic_vector(31 downto 0);
am_r_read : out std_logic;
am_r_readdata : in std_logic_vector(31 downto 0);
am_r_readdatavalid : in std_logic;
am_r_waitrequest : in std_logic;
am_r_burstcount : out std_logic_vector(10 downto 0);
-- Avalon master write
am_w_address : out std_logic_vector(31 downto 0);
am_w_burstcount : out std_logic_vector(10 downto 0);
am_w_write : out std_logic;
am_w_writedata : out std_logic_vector(31 downto 0);
am_w_waitrequest : in std_logic;
-- Interrupt line
irq : out std_logic
);
end entity master_mirror_swap;
architecture rtl of master_mirror_swap is
-- Burst size
constant BURST_SIZE : unsigned := to_unsigned(16, am_r_burstcount'length);
-- Avalon slave port map
constant REG_BEGIN_ADDRESS_OFFSET : std_logic_vector(as_address'length - 1 downto 0) := "00";
constant REG_END_ADDRESS_OFFSET : std_logic_vector(as_address'length - 1 downto 0) := "01";
constant REG_START_OFFSET : std_logic_vector(as_address'length - 1 downto 0) := "10";
constant REG_IRQ_CLEAR_OFFSET : std_logic_vector(as_address'length - 1 downto 0) := "11";
-- Avalon slave control registers
signal reg_begin_address : std_logic_vector(31 downto 0);
signal reg_end_address : std_logic_vector(31 downto 0);
signal reg_irq_pending : std_logic;
-- Avalon master read registers
signal reg_read_pointer : std_logic_vector(31 downto 0);
signal reg_read_burstcount : std_logic_vector(10 downto 0);
-- Avalon master write registers
signal reg_write_pointer : std_logic_vector(31 downto 0);
signal reg_write_next_output : std_logic_vector(31 downto 0);
signal reg_write_output : std_logic_vector(31 downto 0);
signal reg_write_burstcount : std_logic_vector(10 downto 0);
-- FSM states for Avalon slave
type as_state_type is (
Q_RECEIVE,
Q_BUSY
);
signal as_state : as_state_type;
-- FSM states for Avalon master read
type am_r_state_type is (
Q_IDLE,
Q_WAIT_READY,
Q_BURST_SETUP,
Q_READ,
Q_DONE
);
signal am_r_state : am_r_state_type;
-- FSM states for Avalon master write
type am_w_state_type is (
Q_IDLE,
Q_WAIT_DATA,
Q_FETCH_WORD,
Q_HOLD_VALUE,
Q_BURST_SETUP,
Q_WRITE,
Q_DONE
);
signal am_w_state : am_w_state_type;
-- Mirror/swap operator ports
signal wire_op_input : std_logic_vector(31 downto 0);
signal wire_op_output : std_logic_vector(31 downto 0);
signal op_valid : std_logic;
signal op_valid_next : std_logic;
-- Declare mirror/swap operator component
component mirror_swap_op
port (
input : in std_logic_vector(31 downto 0);
output : out std_logic_vector(31 downto 0)
);
end component;
-- Declare FIFO IP component
component fifo_ip
port (
clock : in std_logic;
data : in std_logic_vector(31 downto 0);
rdreq : in std_logic;
wrreq : in std_logic;
almost_empty : out std_logic; -- less or equal than 16 elements in FIFO
almost_full : out std_logic; -- more or equal than 16 elements in FIFO
empty : out std_logic;
full : out std_logic;
q : out std_logic_vector(31 downto 0);
usedw : out std_logic_vector(4 downto 0)
);
end component;
-- Additional FIFOs plumbing
signal fifo_read_empty : std_logic;
signal fifo_read_almost_empty : std_logic;
signal fifo_write_full : std_logic;
signal fifo_write_q : std_logic_vector(31 downto 0);
signal fifo_write_rdreq : std_logic;
signal fifo_write_almost_full : std_logic;
signal fifo_write_usedw : std_logic_vector(4 downto 0);
begin
-- FSM process for Avalon slave
as_process : process (clk, reset_n)
begin
if reset_n = '0' then
reg_begin_address <= (others => '0');
reg_end_address <= (others => '0');
reg_irq_pending <= '0';
as_state <= Q_RECEIVE;
elsif rising_edge(clk) then
case as_state is
when Q_RECEIVE =>
-- In Q_RECEIVE state, the slave process waits for
-- Avalon writes to setup the control registers
if as_write = '1' then
case as_address is
when REG_BEGIN_ADDRESS_OFFSET =>
-- initialize start adddress
reg_begin_address <= as_writedata;
when REG_END_ADDRESS_OFFSET =>
-- sets end address
reg_end_address <= as_writedata;
when REG_START_OFFSET =>
-- transit to busy state
as_state <= Q_BUSY;
when REG_IRQ_CLEAR_OFFSET =>
-- reset IRQ level
reg_irq_pending <= '0';
when others =>
-- ignore otherwise
end case;
end if;
when Q_BUSY =>
-- In Q_BUSY state, the slave process waits for the
-- the Avalon write master to finish
if am_w_state = Q_DONE then
reg_irq_pending <= '1'; -- trigger IRQ
as_state <= Q_RECEIVE; -- go back to receive state
end if;
end case;
end if;
end process as_process;
-- Connect IRQ level to pending register
irq <= reg_irq_pending;
-- FSM process for Avalon master read
am_r_process : process (clk, reset_n)
variable write_words_left : std_logic_vector(am_r_address'length - 1 downto 0);
begin
if reset_n = '0' then
write_words_left := (others => '0');
am_r_state <= Q_IDLE;
reg_read_pointer <= (others => '0');
reg_read_burstcount <= (others => '0');
am_r_read <= '0';
am_r_burstcount <= (others => '0');
am_r_address <= (others => '0');
elsif rising_edge(clk) then
case am_r_state is
when Q_IDLE =>
-- In Q_IDLE, we wait for the Avalon slave to start
-- in which case we initialize the read pointer
if as_state = Q_BUSY then
reg_read_pointer <= reg_begin_address;
am_r_state <= Q_WAIT_READY;
end if;
when Q_WAIT_READY =>
-- In Q_WAIT_READY, we wait for the read FIFO to have
-- at least BURST_SIZE empty slots
if fifo_read_almost_empty = '1' then
-- helper to compute the number of words left to write
write_words_left := std_logic_vector(unsigned(reg_end_address) - unsigned(reg_read_pointer) + 1);
-- determine burst size
if unsigned(write_words_left) >= BURST_SIZE then
reg_read_burstcount <= std_logic_vector(BURST_SIZE);
else
reg_read_burstcount <= write_words_left(am_r_burstcount'length - 1 downto 0); -- we can take lowest bits since we know < BURST_SIZE
end if;
am_r_state <= Q_BURST_SETUP;
end if;
when Q_BURST_SETUP =>
-- In Q_BURST_SETUP, we assert the necessary lines on the
-- Avalon master read to perform a burst read
am_r_read <= '1';
am_r_burstcount <= reg_read_burstcount;
am_r_address <= reg_read_pointer;
reg_read_burstcount <= std_logic_vector(unsigned(reg_read_burstcount) - 1);
am_r_state <= Q_READ;
when Q_READ =>
-- In Q_READ, we let the FIFO consume one word from the
-- Avalon bus. In case waitrequest is asserted, or data
-- is not valid, we simply wait until it is de-asserted.
if am_r_waitrequest = '0' and am_r_readdatavalid = '1' then
-- No wait request or invalid read data so normal read cycle
if unsigned(reg_read_burstcount) = 0 then
-- Reached the end of the burst
if reg_read_pointer = reg_end_address then
-- Reading is done
am_r_state <= Q_DONE;
else
-- There are still more bursts to read
am_r_state <= Q_WAIT_READY;
end if;
else
reg_read_burstcount <= std_logic_vector(unsigned(reg_read_burstcount) - 1);
end if;
-- In any case, de-assert read, burstcount and address
am_r_read <= '0';
am_r_burstcount <= (others => '0');
am_r_address <= (others => '0');
-- Move on to the next read
reg_read_pointer <= std_logic_vector(unsigned(reg_read_pointer) + 1);
end if;
when Q_DONE =>
-- In Q_DONE state, we wait for the write unit to finish
if am_w_state = Q_DONE then
am_r_state <= Q_IDLE;
end if;
end case;
end if;
end process am_r_process;
-- FSM process for Avalon master write
am_w_process : process (clk, reset_n)
variable write_words_left : std_logic_vector(am_w_address'length - 1 downto 0);
begin
if reset_n = '0' then
-- Reset vars
write_words_left := (others => '0');
-- Reset state
reg_write_pointer <= (others => '0');
fifo_write_rdreq <= '0';
am_w_burstcount <= (others => '0');
reg_write_next_output <= (others => '0');
reg_write_output <= (others => '0');
reg_write_burstcount <= (others => '0');
am_w_address <= (others => '0');
am_w_write <= '0';
am_w_state <= Q_IDLE;
elsif rising_edge(clk) then
case am_w_state is
when Q_IDLE =>
-- In Q_IDLE state, we wait for the Avalon slave to start
-- in which case we initialize the write pointer
if as_state = Q_BUSY then
reg_write_pointer <= reg_begin_address;
am_w_state <= Q_WAIT_DATA;
end if;
when Q_WAIT_DATA =>
-- In Q_WAIT_DATA state, we wait for the write FIFO to
-- deliver data. If there are less than BURST_SIZE elements
-- we send a smaller burst.
-- helper to compute the number of words left to write
write_words_left := std_logic_vector(unsigned(reg_end_address) - unsigned(reg_write_pointer) + 1);
if unsigned(write_words_left) >= BURST_SIZE and fifo_write_almost_full = '1' then
-- request one word from write FIFO
fifo_write_rdreq <= '1';
reg_write_burstcount <= std_logic_vector(BURST_SIZE);
am_w_state <= Q_FETCH_WORD;
elsif unsigned(write_words_left) < BURST_SIZE and unsigned(fifo_write_usedw) = unsigned(write_words_left) then
-- request one word from write FIFO
fifo_write_rdreq <= '1';
reg_write_burstcount <= write_words_left(am_w_burstcount'length - 1 downto 0); -- we can take lowest bits since we know < BURST_SIZE
am_w_state <= Q_FETCH_WORD;
end if;
when Q_FETCH_WORD =>
-- In Q_FETCH_WORD, the first word leaves the FIFO
am_w_state <= Q_HOLD_VALUE;
when Q_HOLD_VALUE =>
-- In Q_HOLD_VALUE, the first word can now be sampled
reg_write_next_output <= fifo_write_q;
am_w_state <= Q_BURST_SETUP;
when Q_BURST_SETUP =>
-- In Q_BURST_SETUP, we assert the necessary lines on the
-- Avalon master write to perform a burst write
reg_write_next_output <= fifo_write_q;
reg_write_output <= reg_write_next_output;
am_w_burstcount <= reg_write_burstcount;
reg_write_burstcount <= std_logic_vector(unsigned(reg_write_burstcount) - 1);
am_w_address <= reg_write_pointer;
am_w_write <= '1';
am_w_state <= Q_WRITE;
when Q_WRITE =>
-- In Q_WRITE, we check whether a wait request has been
-- asserted. Otherwise, we move on to the next word
if am_w_waitrequest = '1' then
-- stall the FIFO out pipeline
fifo_write_rdreq <= '0';
elsif unsigned(reg_write_burstcount) = 0 then
-- the burst is finished
-- stop reading from the FIFO and deassert Avalon lines
fifo_write_rdreq <= '0';
am_w_write <= '0';
am_w_address <= (others => '0');
am_w_burstcount <= (others => '0');
reg_write_next_output <= (others => '0');
reg_write_output <= (others => '0');
if reg_write_pointer = reg_end_address then
-- reached the end of all writes
am_w_state <= Q_DONE;
else
-- there are still more bursts to write
am_w_state <= Q_WAIT_DATA;
end if;
-- Must move write pointer to next valid location
reg_write_pointer <= std_logic_vector(unsigned(reg_write_pointer) + 1);
else
-- anticipate end of FIFO out pipeline
if unsigned(reg_write_burstcount) <= 3 then
-- do not request words from FIFO anymore
fifo_write_rdreq <= '0';
else
fifo_write_rdreq <= '1';
end if;
-- move the FIFO out pipeline
reg_write_next_output <= fifo_write_q;
reg_write_output <= reg_write_next_output;
reg_write_pointer <= std_logic_vector(unsigned(reg_write_pointer) + 1);
reg_write_burstcount <= std_logic_vector(unsigned(reg_write_burstcount) - 1);
end if;
when Q_DONE =>
-- just one cycle to signal other process that we finished
am_w_state <= Q_IDLE;
end case;
end if;
end process am_w_process;
-- Connect Avalon master write output
am_w_writedata <= reg_write_output;
-- Instantiate mirror/swap operator component
op : mirror_swap_op port map(
input => wire_op_input,
output => wire_op_output
);
-- Assert operation validity to enable word transfer between FIFOs
op_valid_next <= not fifo_read_empty and not fifo_write_full;
delay_op_valid : process (clk, reset_n)
begin
if reset_n = '0' then
op_valid <= '0';
elsif rising_edge(clk) then
op_valid <= op_valid_next;
end if;
end process delay_op_valid;
-- Instantiate read FIFO
fifo_read : fifo_ip port map(
clock => clk,
data => am_r_readdata,
rdreq => op_valid_next,
wrreq => am_r_readdatavalid,
almost_empty => fifo_read_almost_empty,
almost_full => open,
empty => fifo_read_empty,
full => open,
q => wire_op_input,
usedw => open
);
-- Instantiate write FIFO
fifo_write : fifo_ip port map(
clock => clk,
data => wire_op_output,
rdreq => fifo_write_rdreq,
wrreq => op_valid,
almost_empty => open,
almost_full => fifo_write_almost_full,
empty => open,
full => fifo_write_full,
q => fifo_write_q,
usedw => fifo_write_usedw
);
end architecture rtl;
--
-- MIRROR AND SWAP OPERATOR
-- RTES Lab 2
--
-- file: mirror_swap_op.vhd
-- author: Alexandre CHAU
-- date: 18/04/2020
-- version: 0.1
--
-- This component is a combinatorial operation of bits mirror and swap.
-- Given a 32 bits input a31..a0 it returns a 32 bits output o31..o0
-- such that:
-- a31..a24 -> o7..o0 i.e. MSB becomes LSB
-- a7..a0 -> o31..o24 i.e. LSB becomes MSB
-- a23..a8 -> o8..o23 i.e. middle bytes are reversed
--
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity mirror_swap_op is
port (
input : in std_logic_vector(31 downto 0);
output : out std_logic_vector(31 downto 0)
);
end entity mirror_swap_op;
architecture comb of mirror_swap_op is
begin
-- a31..a24 -> o7..o0 MSB becomes LSB
output(7 downto 0) <= input(31 downto 24);
-- a7..a0 -> o31..o24 LSB becomes MSB
output(31 downto 24) <= input(7 downto 0);
-- a23..a8 -> o8..o23 middle bytes are reversed
gen: for i in 8 to 23 generate
output(i) <= input(23 - i + 8);
end generate gen;
end architecture comb;