Design of reversible MAC unit, shift and add multiplier using PSDRM technique

Sasikala.R¹,Tamilarasi.C², Vimaladevi.R¹

¹PG Student,²Asst. Professor, Dept. of ECE, Venkateshwara Hi-Tech Engineering College, Gobi, India.
³PG Student, Dept. of Computer Science Engineering, SJB Institute of Technology, Bangalore, India

Abstract—Reversible logic has become very promising for low power design using emerging computing technologies. Sequential circuits can be built by replacing the latches, flip-flops and associated combinational gates of the traditional irreversible designs by their reversible counter parts. Since it leads to high quantum cost and garbage outputs this replacement technique is not very promising. It is possible to design synchronous sequential circuits directly from reversible gates using pseudo Reed–Muller expressions representing the state transition and the output functions of the circuit. The multiplication and accumulation (MAC) are the important operations involved in almost all the Digital Signal Processing applications. Accumulator in this MAC unit will be designed using PSDRM technique. And Multiplier is the fundamental components of many digital and non digital systems and hence, their power dissipation is the prime concern. The design of an Error Tolerant (ET) Shift-and Add Multiplier is done. It utilizes the concept of error tolerant addition for accumulation of partial products and a ring counter for shifting of multiplier bits and partial product. Shift register used to shift multiplier bits and partial product is designed using Pseudo Reed Muller expression.

Keywords—MAC unit, PSDRM, Reversible logic, Shift and add multiplier

1. INTRODUCTION

In Irreversible computation for every bit of information that is erased, KTln2 joule of energy dissipates as heat. Where K is Boltzmann’s constant and T is the absolute temperature. Bennett showed that in reversible computation no lose of information which can avoid KTln2 energy dissipation. A circuit is reversible if there is a one-to-one correspondence between input and output, in reversible gates input vector can be achieved by output vector. Reversible logic has wide application in many emerging computing technologies such as SFL technology, optical technology, quantum dot cellular automata technology and nanotechnology. Reversible logic plays a very important role in quantum computing and quantum information. Reversible logic has become a promising technology for power-efficient emerging computing technologies. So, developing efficient methods for reversible logic synthesis and also designing practically important reversible circuits have become very important.

Most of the reversible logic attempts are concentrated on combinational logic synthesis. As feedback is considered as a restriction in reversible logic, some researchers argued that reversible sequential logic is not possible. If the feedback is provided through a delay element, then the feedback information will be available as the input to the reversible combinational circuit in the next clock cycle and sequential logic is possible.

Very recently, only limited attempts are made in the field of reversible sequential logic synthesis. Some presented a reversible design of building blocks of sequential circuits such as latches and flip-flops on the top of reversible gates and suggest that sequential circuits be constructed by replacing the latches, flip-flops, and other combinational gates of traditional irreversible designs by their reversible counter parts.

There are two types of direct design for sequential circuits called Positive Polarity Reed Muller (PPRM) Expression and Fixed Polarity Reed Muller (FPRM) Expression method. But the Pseudo Reed–Muller (PSDRM) expression is a more generalized class of Reed–Muller expression and requires less or at most equal number of product terms than FPRM expression for a given function. Thus, PSDRM-method is more efficient than PPRM- and FPRM based reversible circuit synthesis.

2. REVERSIBLE GATES

A reversible gate (or a circuit) maps every input combination to a unique output combination.
2.1 NOT Gate
The reversible 1*1 gate is NOT gate with zero quantum cost.

Figure 2.1 NOT gate

2.2 Feynman / CNOT Gate
The reversible 2*2 gate with quantum cost of one having mapping input (A, B) to output (P ≡ A, Q ≡ A ⊕ B) is as shown in the figure 2.2.

Figure 2.2 Feynman/CNOT gate

2.3 Toffoli Gate
The 3*3 reversible gate with three inputs and three outputs. The inputs (A, B, C) mapped to the outputs (P = A, Q = B, R = A.B ⊕ C) is as shown in the figure 2.3. Toffoli gate is one of the most popular reversible gates and has quantum cost of five.

Figure 2.3 Toffoli gate

2.4 Fredkin Gate
Reversible 3*3 gate maps inputs (A, B, C) to outputs (P = A, Q = A'B ⊕ AC, R = AB ⊕ A'C) having quantum cost of five.

Figure 2.4 Fredkin gate

2.5 Peres Gate
There are three inputs and three outputs i.e., 3*3 Reversible gate having inputs (A, B, C) mapping to outputs (P = A, Q = A ⊕ B, R = (A.B ⊕ C).

Figure 2.5 Peres Gate

2.6 Double Peres Gate
The 4x4 reversible gate with four inputs and four outputs. The inputs (A, B, C, D) mapped to (P = A, Q = A ⊕ B, R = A ⊕ B ⊕ D, S = (A ⊕ B.D ⊕ AB ⊕ C) is as shown in figure 2.6. The DPG has a quantum cost of six.

Figure 2.6 Double Peres Gate

The Double Peres Gate (DPG) alone can be used as reversible full adder. The full adder function is realized by using input C as control input i.e., logical low and D input as Cin input of the full adder. With inputs (A, B, 0, Cin) mapped to the outputs (P = A, Q = B, R = A ⊕ B ⊕ C, S = (A ⊕ B).Cin ⊕ AB is as shown in figure 2.6. Here R, S are sumand carry outputs of the full adder respectively.

3. REVERSIBLE LOGIC USINGPSDREXPRESSIONS
An n-variable Boolean function \( f(x_1, x_2, \ldots, x_n) \) can be expanded on the variable \( x_i \) using any of the following expansions:

\[
f(x_1, x_2, \ldots, x_i, \ldots, x_n) = f_0 \oplus x_i f_2 \]

(positive Davio, pD)

\[
f(x_1, x_2, \ldots, x_i, \ldots, x_n) = f_1 \oplus x'_i f_2 \]

(negative Davio, nD)

Where,

\[
f_0 = f(x_1, \ldots, x_{i-1}, 0, x_{i+1}, \ldots, x_n) \]

(3)

\[
f_1 = f(x_1, \ldots, x_{i-1}, 1, x_{i+1}, \ldots, x_n) \]

(4)

\[
f_2 = f_0 \oplus f_1 \]

(5)

If we apply pD expansion on all variables of an n-variable Boolean function \( f(x_1, x_2, \ldots, x_n) \), then the resulting expression can be represented as...
\[ f(x_1, x_2, \ldots, x_n) = \]
\[ f_{00} \cdots 00 \oplus f_{00} \cdots 00 x_n \oplus f_{00} \cdots 10 x_{n-1} \oplus 0 f_{00} \cdots \]
\[ \cdots 11 x_{n-1} \cdots \cdots f_{11} \cdots 11 x_1' x_2' \cdots \]
\[ x_{n-1}' x_n' \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots (6) \]

Where, the coefficients are \( \forall i \in \{0, 1\}, f_i \in \{0, 1\}. \) If a subscript of a coefficient is one, only then the corresponding variable appears in the uncomplemented form in the associated product term. If a coefficient is one, only then the associated product term appears in the expression.

If we apply nD expansion on all variables of an \( n \)-variable Boolean function \( f(x_1, x_2, \ldots, x_n) \), then the resulting expression can be represented as
\[ f(x_1, x_2, \ldots, x_n) = f_{00} \cdots 00 \oplus f_{00} \cdots 00 x_n' \oplus f_{00} \cdots 10 x_{n-1}' \oplus 0 f_{00} \cdots \]
\[ \cdots 11 x_{n-1}' \cdots \cdots f_{11} \cdots 11 x_1' x_2' \cdots \cdots x_{n-1}' x_n' \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots (7) \]

The expression (7) is similar to (6) with the exception that variables appear in the complemented form.

Algorithm for determining next state expressions
1. Consider the sequential circuit has minput and n-bit states. Construct a \( (1 + m + n) \)-input and n-output truth table representing the transition table of the sequential circuit considering the clock, the inputs, and the present states as inputs and the next states as outputs.
2. From the output vector of each of the next states, construct PSDRM tree using steps 3 & 4
3. At all nodes, choose pD or nD expansion that produces the minimum number of ones at the next level. Break the tie by choosing pD expansion.
4. Determine PSDRM expressions for the outputs from the constructed PSDRM trees.

Algorithm for determining the output expression
1. Consider the sequential circuit has minput, n-bit states, and y outputs. Construct a \( (m + n) \)-input and y-output truth table representing the output functions of the sequential circuit considering the inputs and the present states as inputs and the y outputs as outputs.
2. From the output vector of each of the output functions, construct PSDRM tree using step 3.

4. MULTIPY AND ACCUMULATE (MAC) UNIT
All Digital Signal Processing (DSP) algorithms use MAC operation for high performance digital processing system. This operation is needed in filters, Fourier Transforms, etc. since which eases the computation of convolution. A MAC unit comprises of a multiplier, adder and an accumulator.

The multiplier multiplies the inputs and gives the result to the adder, which adds the multiplier result to the previously accumulated result. A MAC unit is used to perform the multiplication and accumulator operations together to avoid unnecessary overhead on the processor in terms of processing time and the on-chip memory requirements.

The reversible multiplier can be implemented by the combination of reversible half adders, full adders and Peres gates. The reversible adder is used as the adder and the reversible accumulator is designed using the reversible shift register. The diagram for four-bit reversible MAC Unit is shown in figure 4.1. The proposed reversible MAC unit comprises of a four-bit reversible Multiplier, eight-bit reversible Adder and eight-bit reversible Accumulator register.

![Figure 4.1 Reversible 4-bit multiply-accumulate unit](image-url)
as the present out of the multiplier. Again the result of the adder is stored back into the accumulator and this process will repeat till the last bits. In DSP, Discrete Fourier Transform (DFT) computation is most widely used where number of multiplications and additions should be performed. High amount of the power consumption occurs during data manipulation. Therefore to minimize the power consumption DFT computation can be implemented by reversible MAC unit. The function of the multiply accumulate unit is given by the following expression.

\[ F = \sum X_i \cdot Y_i \]

The accumulator unit will be designed using PSDRM Technique. Which will reduce garbage output and quantum cost.

### 4.1 Multiplier

The basic operation of 4x4 parallel multiplier circuit is depicted in figure 4.2. A reversible 4x4 multiplier circuit has two parts: Partial Product Generation (PPG) circuit and Multi-Operand Addition (MOA) circuit.

**Figure 4.2** The basic operation of 4x4 parallel multiplier

**Partial Product Generation**

Partial products are generated using the reversible gate like reversible and gate, Feynman gate, DPG gate.

**Multi-Operand Addition**

The addition of the partial products using DPG and PG gates is as shown in figure 4.4. The basic cells for such a multiplier is full adder using DPG with three inputs and one constant input, two garbage outputs and half adder using PG with two inputs and one constant input, one garbage output. The reversible multiplier circuit uses eight DPG gates, four PG gates

### 4.2 Adder

Since DPG gate can used as the full adder, for adding the accumulator value and partial product DPG gate will be used. Here half adder and DPG gate is used.

**Figure 4.4** Reversible Eight-bit parallel Full Adder

### 4.3 Accumulator

Accumulator is one of the most extensively used functional devices in digital systems. A register will be having group of flip-flops connected together so that the information bits can be stored within a digital system so that they can be used later during computing process. Shift register is a register in which information can be shifted bit wise depending on the clock signal. This section proposes Parallel-In-Parallel-Out (PIPO) shift register. The design is proposed using the PSDRM technique considering the previous designs, we are aiming on reducing the quantum cost of each designs.

**Reversible eight-bit PIPO Shift Register**

**Figure 4.5** Reversible Eight-bit PIPO

In this shift register, the inputs are fed simultaneously into the flip-flops and we get the output when we apply clock pulse. Here eight D flip-flops are used for the design. The PIPO shift register
combines the functions of the Parallel-In-Serial-Out (PISO) and Serial-In-Parallel-Out (SIPO) shift registers. When there is a clock signal, the inputs D0, D1…D7 are loaded parallel into the register coincident. The outputs Q0, Q1…Q7 are available in parallel at the Q output of the flip-flops. The quantum cost of D flip-flop is six, therefore the quantum cost of eight-bit shift register will be forty eight.

**TABLE I**

<table>
<thead>
<tr>
<th>C</th>
<th>D1</th>
<th>D2</th>
<th>Q1</th>
<th>Q2</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

From the above table, take the values of Q1 and make the PSDRM tree using the rules explained in the section 3. Truth Table gives the values for two bit.

**Figure 4.6** Determination of PSDRM expression for PIPO

From the above table, take the values of Q1 and make the PSDRM tree using the rules explained in the section 3. Truth Table gives the values for two bit.

The B(0) bit(Least significant bit) of B register is used as the control signal P for Error tolerant adder. If P=1, the multiplier bits in A register will be added with bits of partial product register. If P=0 the Error tolerant adder goes to OFF state and just the shifted bits of PP register is bypassed from adder using bypass register. The shifting of PP register and B register is achieved together using ring counter output and the clock.

**Figure 5.1 Shift and add multiplier**

During each movement of counter values the contents of PP and B register will be shifted by one bit position towards LSB and the shifting procedure is stopped when the counter bits attains the maximum value(10000000). So the counter has to be designed based on the number of bits of multiplier.

**Error Tolerant Addition:** There are two common terminologies used in Error Tolerant additions are as follows:

- Overall error (OE): OE=|Rc-Re |, where Re is the result obtained by the Error tolerant addition technique, and Rc is the correct result
- Accuracy (ACC): Accuracy of an addition process is to indicate how “correct” the output of an adder is for a particular input. It is defined as ACC %=( 1-(OE/Rc)) x 100. Its value ranges from 0-100%.

**Addition Arithmetic:** In the conventional adder circuit, the delay is mainly occurs due to the carry propagation chain along the critical path, from the least significant bit (LSB) to the most significant bit (MSB). Also glitches in the carry propagation chain dissipate a significant proportion of dynamic power dissipation. Therefore, if the carry propagation can be eliminated or reduced, a great improvement in speed performance and power consumption can be achieved.
For example,

Where, the input operand is split into two parts: higher order bits grouped into accurate part and lower order bits into inaccurate part. The length of each part need not necessary be equal. The addition process starts from the demarcation line toward the two opposite directions simultaneously.

A = "10110111" and B = "10111101"

The addition of the higher order bits is performed from right to left starting from the demarcation line with normal addition method. This is to preserve its correctness since the higher order bits play a more important role than the lower order bits. The lower order bits are added using error tolerant addition mechanism. No carry signal will be generated at any bit position to eliminate the carry propagation path.

(1) Check every bit position from left to right (MSB - LSB) starting from right of demarcation line.
(2) If both input bits are “0” or different, normal one-bit addition is performed and the operation proceeds to next bit position;
(3) The checking process is stopped when both input bits are encountered as high i.e., 1, and from this bit onwards, all sum bits to the right (LSB) are “1.”

From example the answer is, 101110100 (367) which should actually yield 101110100 (372) if normal arithmetic has been applied. OE=372-367=5. The accuracy of the adder is ACC=(1-5/372)×100=98.66%. This accuracy level is acceptable for most of the image processing applications.

**B register:** Here B register is the Multiplier register. It is serial in serial out register. This can be designed using the Pseudo Reed Muller Expression discussed in section 3. It will reduce the power consumption since it is reversible and it will reduce the quantum cost and garbage output.

### TABLE II
**TRANSITION TABLE OF SISO**

<table>
<thead>
<tr>
<th>C</th>
<th>D</th>
<th>Q0</th>
<th>Q1</th>
<th>Q0'</th>
<th>Q1'</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

**Figure 5.2** Determination of PSDRM expression for SISO

In Figure 5.2 for all section pD expansion is applied since it yields minimum number of one. Therefore we can write the equation for Q0’ as:

Q0’ = Q0 ⊕ CQ0 ⊕ CQ1

(1)

Similarly,

Q1’ = Q1 ⊕ CQ1 ⊕ CQ2

(2)

Q2’ = Q2 ⊕ CQ2 ⊕ CQ3

(3)

Q3’ = Q3 ⊕ CQ3 ⊕ CQ4

(4)

Q4’ = Q4 ⊕ CQ4 ⊕ CQ5

(5)

Q5’ = Q5 ⊕ CQ5 ⊕ CQ6

(6)

Q6’ = Q6 ⊕ CQ6 ⊕ CQ7

(7)

Q7’ = Q7 ⊕ CQ7 ⊕ CD

(8)
This equation uses Feynman gate and reversible and gate discussed earlier. The quantum cost will be 96 and garbage output is 12.

6. RESULT AND COMPARISION
MAC UNIT
Reversible MAC unit is coded in VHDL and simulated using Modelsim simulator different types of reversible gates are used here.

<table>
<thead>
<tr>
<th>TABLE III</th>
<th>PERFORMANCE COMPARISON OF ACCUMULATOR</th>
</tr>
</thead>
<tbody>
<tr>
<td>QC</td>
<td>GO</td>
</tr>
<tr>
<td>Arpitha and muralidhara</td>
<td>80</td>
</tr>
<tr>
<td>Shaik nasar and Subbarao</td>
<td>60</td>
</tr>
<tr>
<td>Proposed(PSDRM)</td>
<td>48</td>
</tr>
</tbody>
</table>

SHIFT AND ADD MULTIPLIER
Multiplier bits are given serially and output comes after 8 bits are given.

<table>
<thead>
<tr>
<th>TABLE IV</th>
<th>COMPARISON OF SHIFT REGISTER:</th>
</tr>
</thead>
<tbody>
<tr>
<td>QUANTUM COST</td>
<td>PSDRM</td>
</tr>
<tr>
<td>GARBAGE OUTPUT</td>
<td>96</td>
</tr>
<tr>
<td></td>
<td>12</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>TABLE V</th>
<th>COMPARISON OF POWER</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power(mW)</td>
<td></td>
</tr>
<tr>
<td>Conventional Shift and Add Multiplier</td>
<td>295</td>
</tr>
<tr>
<td>BZ-FAD Shift and Add Multiplier</td>
<td>271</td>
</tr>
<tr>
<td>Proposed</td>
<td>139</td>
</tr>
</tbody>
</table>

7. CONCLUSION
Reversible logic has shown a good promise for low-power design using emerging computing technologies. We have designed a MAC unit and shift and add multiplier using PSDRM technique. When comparing with others, in proposed design Power and quantum cost are reduced.

REFERENCES


BIOGRAPHY:

Sasikala.R is currently a PG student with the Department of Electronics and communication Engineering, Shree Venkateshwara Hi-Tech Engineering college, Anna University, Gobi. Her current research interests include VLSI low power design.

C.Thamilarasi is currently an Assistant Professor with the Department of Electronics and communication Engineering, Shree Venkateshwara Hi-Tech Engineering college, Anna University, Gobi. Her current research interests include VLSI low power design.

Vimala Devi.R is currently a PG student with the Department of Computer Science and Engineering, SJB Institute of Technology, VTU, Bangalore. Her current research interests include VLSI low power design and embedded technologies.