My experiences in VHDL/FPGA (part 1)

I recently started learning VHDL and because I found a number of pitfalls and traps, I thought I’d document them here. Maybe that would help someone in the future.

VHDL is a language to describe how FPGAs are programmed. An FPGA is a sort of computer chip which can be configured (wired up, if you wish) using a bitstream file. Rather than having dedicated circuitry for (for example) adding 2 32 bit numbers, an FPGA must be programmed to do this. Or add 2 33 bit numbers, if you wish.

An FPGA does this by having certain basic components arranged in “logic blocks” on the chip. A logic block can be thought of as a super logic-gate with an internal memory. It has a nomber of single bit inputs and one single bit output. An FPGA could have 100s of thousands of logic blocks.

A logic block is made of a number of components. One of these is the look-up table or LUT. This is a table that maps an input of x-bits to a single bit output. Logical “NOT” could be modelled simply with one input and the following values:

INPUT#1 OUTPUT
0 1
1 0

More complicated boolean relationships can be modelled using larger tables.

As well as this, a logic block has flip-flops to store the output of the LUT, a full-adder (with the carry coming in from another logic block), a handful of muxes so as to set the output and dedicated clock circuitry to control when the flip-flops trigger.

Programming an FPGA involves loading the data matrixes into the LUTs, configuring the muxes and setting the routing between the logic blocks. To get to this level from the high level of abstraction of even assembler is not a trivial task. Maybe I’ll read up about how VHDL code is compiled at some stage.

It is important to note that lots of things can be (and usually are) happening on an FPGA at the same time. This is the reason why dedicated clock circuitry is so important. Coordinating signals is only possible when the chip manufacturer can guarantee that the signal at flip-flop A will have arrived and be stable at flip-flop B at the next rising edge of the clock signal. In the case of video processing for example, part of the FPGA may be analyzing the incoming signal, another part might be modifying the signal and sending it to the output, a third part might be transmitting information about the signal over an internet connection while a fourth part does some complicated maths all at the same time. There is barely a limit to the level of multi-tasking possible on an FPGA.

Okay, enough about FPGAs. Let’s start on VHDL.

VHDL tries to abstract away a lot of the technicalities of FPGA configuration. One of the reasons for this is that FPGAs from different manufacturers have different capabilities. Afaik there is no such thing as “intel compatible” for FPGAs. The FPGA manufacturer will supply an IDE to transform VHDL into a valid configuration bit-stream for their own FPGAs.

Nevertheless the inherent parallelness of FPGAs means that VHDL is not a normal programming language.

First a really basic example.

Here’s a VHDL program that models an AND-gate. This should go in its own file (I called it and_gate.vhdl) :

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity and_gate is
Port (
a, b : in std_logic;
c : out std_logic );
end and_gate;
architecture foobar of and_gate is
begin
c <= a and b;
end foobar;

The first two lines are just boiler-plate. The define the type std_logic which allows signals to be treated as either ‘1’ or ‘0’. std_logic also has functions like ‘and’ (as used in this example).

The next 4 lines define an entity. You can think of this as a little bit of hardware, if you want, though the compiler might decide to spread complicated entities all over the chip and reformulate a lot of the internal logic to be more efficient (rule of thumb: don’t worry about the compiler — it will do the right thing). An entity is rather like an interface in a traditional programming language. It can be imported into other files and just needs to be wired up correctly.

The architecture describes what the entity actually does. In this case, it routes a and b to c. This is what “<=” means. It means “lay-down a wire joining these things”.

Now that we have our component definined, we have to find a way to use it to see what it does. There are two ways to do this. The easiest way is to run it in a simulator. This simulator is controlled by another file which tells the simulator what signals to put on a or b. Traditionally this file is called a testbench. Here’s a simple testbench.vhdl for the AND-gate.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity testbench is
end testbench;
architecture peekaboo of testbench is
component and_gate is
Port (
a, b : in std_logic;
c : out std_logic );
end component;
signal a_sig, b_sig : std_logic := '0';
signal c_sig : std_logic;
constant my_delay : time := 1 ms;
begin
my_and_gate : and_gate port map (a => a_sig, b => b_sig, c => c_sig);
process
begin
wait for my_delay;
a_sig <= '1';
wait for my_delay;
b_sig <= '1';
wait for my_delay;
a_sig <= '0';
wait;
end process;
end peekaboo;

The first two lines are the standard boilerplate.

The entity definition of testbench is empty. The file doesn’t export an interface. It’s the “main” entry point.

The architecture now contains two areas: before begin and after begin. Before begin the interfaces of the components used are listed and signals (which can be thought of as connecting wires) are created. After begin a component is instantiated and wired up. Then, inside a process, the values on the connecting wires are changed.

When you run this file (making sure to check that testbench.vhdl is the top file for the simulation) in the IDE, the values of a_sig, b_sig and c_sig will be displayed.

In the process I’ve used the “wait” command. This is one of a number of commands which cannot be synthesized. It can only be used in a simulated testbench.

This is one way of testing the AND-gate component. Another way would be to synthesize it and wire it up to some buttons and LEDs on an FPGA prototyping board. I’ll probably get around to writing how to do that later.

This all looks rather like a standard language …

Well, yes, it does at the moment. So let’s try something to make it clear that this isn’t a standard programming language. Let’s break the AND-gate.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity and_gate is
Port (
a, b : in std_logic;
c : out std_logic );
end and_gate;
architecture Behavioral of and_gate isbegin
c <= a;
c <= b;
end Behavioral;

Now I’ve changed the wiring laid down in the “begin” part of the architecture. On the one hand, I’m routing input a to output c, on the other hand i’m routing input b to output c. This is, of course, rather stupid. What should the value of c be if a is ‘1’ and b is ‘0’? Running testbench.vhdl with this version of and_gate.vhdl results in non-defined c_sig when a_sig is not the same as b_sig. If you try to synthesize this program to create a circuit diagram, the software will complain saying “c is being driven multiple times” (or similar).

I find it easiest to think of it like this: every command after begin in the architecture section lays down its own wires. This is something to watch out for. If you define 3 different processes in the architecture section, you must take care that they don’t try to change the wiring laid down in one of the other processes.

Constraints like these tend to result in certain standard ways of structuring code or language idioms. Before I introduce one of these, there’s one other area that produces a number of constraints — timing. Let’s have a look at a XOR-gate.

We can create a file xor_gate.vhdl which is basically a copy of and_gate.vhdl but with this architecture:

architecture Behavioral of xor_gate is

begin
c <= a xor b;
end Behavioral;

And we can change testbench.vhdl to use the XOR-gate. Something like this will work:

my_xor_gate : xor_gate port map(a => xor_a_sig, b => xor_b_sig, c => xor_c_sig);
process
begin
wait for my_delay;
xor_a_sig <= '1';
wait for my_delay;
xor_a_sig <= '0';
xor_b_sig <= '1';
wait for my_delay;
xor_a_sig <= '1'; -- this line is interesting
xor_b_sig <= '0';
wait for my_delay;
xor_b_sig <= '1';
wait for my_delay;
wait;
end process;

When we now simulate the testbench, everything works the way we would expect. But when one synthesizes the design and programs an FPGA suddenly the output of the XOR-gate jumps around a bit. What’s going on here? Have a look at the interesting line. At the end of that line, xor_a_sig and xor_b_sig are both ‘1’. This should result in xor_c_sig being ‘0’. This doesn’t appear in the simulation. The rules for timing inside process blocks are actually quite complicated. As far as I’ve understood it, the following ‘wait’ statement forces all signal assignments to be carried out at the same time. However in a real FPGA, because of propagation delays, it’s quite possible that xor_a_sig will arrive at the XOR-gate before xor_b_sig arrives. Indeed, it’s highly unlikely that two paths on an FPGA should take the exact same time. Brief changes in an output because of propogation delays for the inputs are called glitches.

This means that some sort of synchronization is necessary and an FPGA has dedicated signal lines for a clock signal: signal lines which are guaranteed to fulfil certain constraints (once I’ve organized my thoughts on the matter, I may write a post about how these constraints on clock signals and propogation times actually permeate through the VHDL language specification. I currently have the feeling that the language would be very different if these constraints couldn’t be assumed).

A language idiom: How to use a clock signal

Let’s rewrite xor_gate.vhdl to accept a clock signal. This is certainly overkill for a simple xor_gate, but very useful for more complicated components.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity clocked_xor_gate is
Port (
clk, a, b : in std_logic;
c : out std_logic);
end clocked_xor_gate;
architecture Behavioral of clocked_xor_gate is
signal c_next : std_logic; -- a flip-flop memory.
begin
process(clk) -- process now has a sensitivity list!
begin
if rising_edge(clk) then
c <= c_next;
end if;
end process;

c_next <= a xor b;

end Behavioral;

There are a number of interesting changes here. Firstly the process now has a sensitivity list. You can think of this like a JavaScript “onStateChange” or similar. When the value of clk changes, this process will be run. The “if” condition in the process means that c will only be assigned when the state of clk changes from low to high.

A second interesting change is the addition of “c_next”. This is a memory which stores one value. This allows synchronization. It is now not important exactly when a and b changed, c will take the value at the next rising edge of clk. What happens if a or b change at exactly the same time as the rising edge of clk? I hear you ask. c gets the value just before the rising edge, which means that c is delayed by one whole clock cycle. Let’s see this in action. Here’s a new testbench.vhdl:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity testbench is
-- Port ( );
end testbench;
architecture Behavioral of testbench is
component clocked_xor_gate is
Port (
clk, a, b : in std_logic;
c : out std_logic);
end component;

signal a_sig, b_sig, clk_sig : std_logic := '0';
signal c_sig : std_logic;
constant clk_period : time := 1 ms;
begin
my_cxg : clocked_xor_gate port map (clk=>clk_sig, a => a_sig, b => b_sig, c => c_sig);
process -- <= this process generates the clock
begin
clk_sig <= '1';
wait for clk_period/2;
clk_sig <= '0';
wait for clk_period/2;
end process;
process <= this process generates the values on a_sig and b_sig
begin
wait for clk_period;
a_sig <= '1';
wait for clk_period;
a_sig <= '0';
b_sig <= '1';
wait for clk_period;
a_sig <= '1';
b_sig <= '0';
wait for clk_period;
b_sig <= '1';
wait;
end process;
end Behavioral;

Remember that all commands after begin in the architecture part of a vhdl file are synchronous? Here’s an example. Both processes start at the same time. The first one loops infinitely, the second stops because of the non-qualified “wait” statement at the end.

If we now simulate the test bench, we get the following output:

image showing delay of c_sig

c_sig is now delayed by one clock-cycle. However, the delay is well defined and the component will now work without glitches when synthesized.

Ok. That’s enough from me for today. Let me know if I’ve totally misunderstood something. For the next part in the series I’ll probably look at how to structure code for a larger project, or how to use a debugging core.

Have a nice day!

Just your typical programmer