Instructions for Design Problem: 32-bit ALU

In this lab, we'll build the arithmetic and logic unit (ALU) for the Beta processor. The ALU has two 32-bit inputs (which we'll call "A" and "B") and produces one 32-bit output. We'll start by designing each piece of the ALU as a separate circuit, each producing its own 32-bit output. Then we'll combine these outputs into a single ALU result.

When designing circuitry there are three separate factors that can be optimized:
  1. design for maximum performance (minimum latency)
  2. design for minimum cost (minimum area)
  3. design for the best cost/performance ratio (minimize area*latency)
Happily it's often possible to do all three at once but in some portions of the circuit some sort of design tradeoff will need to be made. When designing your circuitry you should choose which of these three factors is most important to you and optimize your design accordingly.

Computation Structures standard cell library & gate-level simulation

The building blocks for this lab come from a library of logic gates -- IC manufacturers often have a "standard cell library" and various design tools to make it easier for their customers to design without worrying about the detailed geometry of the mask layers used to create mosfets and wiring. Computation Structures has its own Standard Cell Library which provides: See the library documentation for details on the appropriate connections for each gate. In Jade, the gates in the standard cell library can be found in the parts bin under "/gates/".

Since we're designing at the gate level we can use a faster simulator that only knows about gates and logic values (instead of transistors and voltages). Note that your design can't contain any mosfets, resistors, capacitors, etc.; the gate-level simulator only supports the gate primitives in the standard cell library.

Inputs are still specified in terms of voltages (to maintain netlist compatability with the other simulators) but the gate-level simulator converts voltages into one of three possible logic values using the vil and vih thresholds specified at the beginning of the test specification. A fourth value Z is used to represent the value of nodes that aren't being driven by any gate output (e.g., the outputs of tristate drivers that aren't enabled). The following diagram shows how these values appear on the waveform display:
ALU specification

The 32-bit ALU we will build will be a component in the Beta processor we will address in subsequent laboratories. The logic symbol for our ALU is shown to the right. It is a combinational circuit taking two 32-bit data words A and B as inputs, and producing a 32-bit output Y by performing a specified arithmetic or logical function on the A and B inputs. The particular function to be performed is specified by a 6-bit control input, FN, whose value encodes the function according to the following table:

FN[5:0]OperationOutput value Y[31:0]
00-011CMPEQ\(Y = (A == B)\)
00-101CMPLT\(Y = (A \lt B)\)
00-111CMPLE\(Y = (A \le B)\)
01---032-bit ADD\(Y = A + B\)
01---132-bit SUBTRACT\(Y = A - B\)
10abcdBit-wise Boolean\(Y[i] = F_{abcd}(A[i],B[i])\)
11--00Logical Shift left (SHL)\(Y = A \lt\lt B\)
11--01Logical Shift right (SHR)\(Y = A \gt\gt B\)
11--11Arithmetic Shift right (SRA)\(Y = A \gt\gt B\) (sign extended)
Note that by specifying an appropriate value for the 6-bit FN input, the ALU can perform a variety of arithmetic operations, comparisons, shifts, and bitwise Boolean combinations required by our Beta processor.

BiAiYi
00d
01c
10b
11a
The bitwise Boolean operations are specified by FN[5:4]=10; in this case, the remaining FN bits abcd are taken as entries in the truth table describing how each bit of Y is determined by the corresponding bits of A and B, as shown to the right.

The three compare operations each produce a Boolean output. In these cases, Y[31:1] are all zero, and the low-order bit Y[0] is a 0 or 1 reflecting the outcome of the comparison between the 32-bit A and B operands.

We can approach the ALU design by breaking it down into subsystems devoted to arithmetic, comparison, Boolean, and shift operations as shown below:
Designing a complex system like an ALU is best done in stages, allowing individual subsystems to be designed and debugged one at a time. The steps below follow that approach to implementing the ALU block diagram shown above. We begin by implementing an ALU framework with dummy modules for each of the four major subsystems (BOOL, ARITH, CMP, and SHIFT); we then implement and debug real working versions of each subsystem. To help you follow this path, we provide separate tests for each of the four component modules.

NOTE: the FN signals used to control the operation of the ALU circuitry use an encoding chosen to make the design of the ALU circuitry simple. This encoding is typically not the same as the one used to encode the opcode field of the processor instruction and the processor circuitry will include logic to generate the appropriate ALU control bits for each instruction.

Use the Jade instance at the bottom of this page to enter your design. There are design notes below suggesting how to go about the design of each of the sub-modules.

BOOL unit

Design the circuitry to implement the Boolean operations for your ALU and use it to replace the jumper and wire that connects the Y[31:0] output to ground.

The suggested implementation uses 32 copies of a 4-to-1 multiplexor (mux4) where BFN[3:0] encode the operation to be performed and A[31:0] and B[31:0] are hooked to the multiplexor's select inputs. This implementation can produce any of the 16 2-input Boolean functions. The following table shows the encodings for some of the BFN[3:0] control signals used by the test jig (and in our typical Beta implementations):
OperationBFN[3:0]
AND1000
OR1110
XOR0110
"A"1010
The BOOL test actually checks all 16 boolean operations on a selection of arguments, and will report any errors that it finds.

When your BOOL circuitry has been entered, run the test by clicking the green checkmark; a simulation plot window showing the inputs and outputs should appear. Jade will check your circuit's results against a list of expected values and report any discrepancies it finds. ARITH unit

Design an adder/subtractor (ARITH) unit that operates on 32-bit two's complement inputs and generates a 32-bit output. It will be useful to generate three other output signals to be used by the CMP unit: Z which is true when the S outputs are all zero, V which is true when the addition operation overflows (i.e., the result is too large to be represented in 32 bits), and N which is true when the sum is negative (i.e., S[31] = 1). Overflow can never occur when the two operands to the addition have different signs; if the two operands have the same sign, then overflow can be detected if the sign of the result differs from the sign of the operands:
\(V = XA_{31}\cdot XB_{31}\cdot \overline{S_{31}} + \overline{XA_{31}}\cdot\overline{XB_{31}}\cdot S_{31}\)
Note that this equation uses XB[31], which is the high-order bit of the B operand to the adder itself (i.e., after the XOR gate -- see the schematic below). XA[31] is simply A[31].

The following schematic is one suggestion for how to go about the design:
AFN will be set to 0 for an ADD (\(S = A+B\)) and 1 for a SUBTRACT (\(S = A-B\)); A[31:0] and B[31:0] are the 32-bit two's complement input operands; S[31:0] is the 32-bit result; Z/V/N are the three condition code bits described above. We'll be using the "little-endian" bit numbering convention where bit 31 is the most-significant bit and bit 0 is the least-significant bit.

We've provided a FA module for entering the gate-level schematic for the full adder to be used in constructing the 32-bit ripple carry adder that forms the heart of the ARITH unit.

The AFN input signal selects whether the operation is an ADD or SUBTRACT. To do a SUBTRACT, the circuit first computes the two's complement negation of the B operand by inverting B and then adding one (which can be done by forcing the carry-in of the 32-bit add to be 1). Start by implementing the 32-bit add using a ripple-carry architecture (you'll get to improve on this later on in the course). You'll have to construct the 32-input NOR gate required to compute Z using a tree of smaller fan-in gates (the parts library only has gates with up to 4 inputs).

When entering your circuitry, remember to delete the original jumpers and wires that connected the outputs to ground!

The module test tries adding and subtracting various operands, ensuring that the Z, V and N outputs are correct after each operation.

CMP unit

The ALU provides three comparison operations for the A and B operands. We can use the adder unit designed above to compute \(A-B\) and then look at the result (actually just the Z, V and N condition codes) to determine if \(A=B\), \(A\lt B\) or \(A\le B\). The compare operations generate a 32-bit result using the number 0 to represent false and the number 1 to represent true.

Design a 32-bit compare (CMP) unit that generates one of two constants (0 or 1) depending on the CFN[1:0] control signals (used to select the comparison to be performed) and the Z, V, and N outputs of the adder/subtractor unit. Clearly the high order 31 bits of the output are always zero. The least significant bit (LSB) of the output is determined by the comparison being performed and the results of the subtraction carried out by the adder/subtractor:
ComparisonEquation for LSBCFN[1:0]
\(A = B\)LSB = \(Z\)01
\(A \lt B\)LSB = \(N \oplus V\)10
\(A \le B\)LSB = \(Z + (N \oplus V)\)11
At the level of the ALU module, FN[2:1] are used to control the compare unit since we need to use FN[0] to control the adder/subtractor unit to force a subtract.

Performance note: the Z, V and N inputs to this circuit can only be calculated by the adder/subtractor unit after the 32-bit add is complete. This means they arrive quite late and then require further processing in this module, which in turn makes Y[0] show up very late in the game. You can speed things up considerably by thinking about the relative timing of Z, V and N and then designing your logic to minimize delay paths involving late-arriving signals.

The module test ensures that the correct answer is generated for all possible combinations of Z, V, N and CFN[1:0].

SHIFT unit

Design a 32-bit shifter that implements logical left shift (SHL), logical right shift (SHR) and arithmetic right shift (SRA) operations. The A operand supplies the data to be shifted and the low-order 5 bits of the B operand are used as the shift count (i.e., from 0 to 31 bits of shift). The desired operation will be encoded on SFN[1:0] as follows:
OperationSFN[1:0]
SHL (shift left)00
SHR (shift right)01
SRA (shift right with sign extension )11
With this encoding, SFN[0] is 0 for a left shift and 1 for a right shift and SFN[1] controls the sign extension logic on right shift. For SHL and SHR, 0's are shifted into the vacated bit positions. For SRA ("shift right arithmetic"), the vacated bit positions are all filled with A[31], the sign bit of the original data so that the result will be the same as dividing the original data by the appropriate power of 2.

The simplest implementation is to build two shifters -- one for shifting left and one for shifting right -- and then use a 2-way 32-bit multiplexer to select the appropriate answer as the module's output. It's easy to build a shifter after noticing that a multi-bit shift can be accomplished by cascading shifts by various powers of 2. For example, a 13-bit shift can be implemented by a shift of 8, followed by a shift of 4, followed by a shift of 1. So the shifter is just a cascade of multiplexers each controlled by one bit of the shift count. The schematic below shows a possible implementation of the left shift logic; the right shift logic is similar with the slight added complication of having to shift in either 0 (i.e., "gnd") or A[31], depending on the value of SFN[1]. Another approach that saves gates is to use the left shift logic for both left and right shifts, but for right shifts, reverse the bits of the A operand on the way in and reverse the bits of the output on the way out.
The module test checks that all three types of shifts are working correctly.

Final tests

When you've completed the design and testing of the four sub-modules, select the ALU module and run its test. This runs each of the test suites that you've used to debug the component subcircuits, so unless there's some unforeseen interaction among your blocks you're likely to pass the test. When this test completes successfully, clicking on will give you credit for completing this design problem.