# Reliable Architectures for Finite Field Multipliers Using Cyclic Codes on FPGA Utilized in Classic and Post-Quantum Cryptography 

Alvaro Cintas-Canto ${ }^{\ominus}$, Mehran Mozaffari Kermani ${ }^{\ominus}$, and Reza Azarderakhsh ${ }^{( }$


#### Abstract

Fault detection is becoming greatly important in protecting cryptographic designs that can suffer from both natural or malicious faults. Finite fields over $\operatorname{GF}\left(2^{m}\right)$ are widely used in such designs, since their data are coded in binary form for practical reasons. Among the different finite field arithmetic, multiplication is the bottleneck operation for many cryptosystems due to its complexity. Therefore, in this work, fault detection schemes based on cyclic codes for finite field multipliers using different fields found in traditional and post-quantum cryptography are derived. Moreover, we implement such schemes by embedding them into the original architectures to perform an exhaustive study, benchmark the different overheads obtained, and prove their suitability for deeply constrained embedded systems. These implementations are performed on advanced micro devices (AMD)/Xilinx field-programmable gate array (FPGA) and provide a very high error coverage with acceptable overhead.


Index Terms-Cyclic codes, fault detection, fieldprogrammable gate array (FPGA), finite field multiplication.

## I. Introduction

Finite field arithmetic and its hardware implementations have gained considerable interest in the literature due to their applicability to cryptography, coding theory, error-correcting codes, and digital signal processing. Most of the runtime of such applications is spent in the computation of multiplications over $\operatorname{GF}\left(2^{m}\right)$.

These implementations not only have to be efficient but also secure and free of errors. There are two main types of faults for hardware implementations, i.e., natural faults, which are products of the environment or defects in hardware logic operations, and malicious or intentional faults, which are faults injected by an unauthorized party to generally obtain secret information or even produce a denial of service attack. Even though there are many works on fault detection [1], [2], [3], [4], [5], [6], efficient and reliable hardware implementations of finite field multipliers based on cyclic codes are still lacking in the literature. Moreover, most of the exciting works on fault detection for finite fields are based on either 1-bit parity or multibit parity schemes [7], [8], [9], [10]. The issue with the former one is that if the number of faults is even, they remain undetected, providing an error coverage percentage of $50 \%$ at most. Multiparity increases the error coverage, but it is insufficient for intelligent fault injection. This brief proposes fault detection schemes based on cyclic codes, the most commonly used class of linear block codes, to overcome this. Cyclic codes are not only useful in detecting

[^0]TABLE I
Cryptographic Algorithms and Their Respective Fields

| Algorithm | Field | Irreducible <br> Polynomial |
| :---: | :---: | :---: |
| AES | $G F\left(2^{8}\right)$ | $x^{8}+x^{4}+x^{3}+x+1$ |
| McEliece Cryptosystem | $G F\left(2^{13}\right)$ | $x^{13}+x^{4}+x^{3}+x+1$ |
| WG-16 | $G F\left(2^{16}\right)$ | $x^{16}+x^{5}+x^{3}+x^{2}+1$ |
| WG-29 | $G F\left(2^{29}\right)$ | $x^{29}+x^{2}+1$ |

single, double, and even triple errors, but they can also detect burst errors, which are those faults that occur in many consecutive bits rather than happening in bits independent of each other.

Binary fields with polynomial basis (PB) are particularly suitable for VLSI implementations due to their simplicity and modularity. In this work, we present an exhaustive study by deriving fault detection schemes based on cyclic codes for finite field multipliers with PB, implementing such proposed schemes for the fields found in Advanced Encryption Standard (AES), WG Welch-Gong-16 (WG16), WG Welch-Gong-29 (WG-29), and the McEliece cryptosystem, and benchmarking the different overheads on advanced micro devices (AMD)/Xilinx field-programmable gate array (FPGA) family Artix-7 for device xc7a12tcpg238-3.

Our work is outlined as follows. Section II reviews the mathematical properties of PB. In Section III, we discuss the overall finite field multiplier architecture and derive the different fault detection schemes based on cyclic codes for multiplications over $\operatorname{GF}\left(2^{m}\right)$ with PB. Next, such fault detection schemes are embedded into the original finite field multipliers in Section IV to benchmark the different overheads. Finally, Section V concludes this brief.

## II. Preliminaries

The Galois field of order $p^{m}$ is a finite field with $p^{m}$ elements, abbreviated as $\operatorname{GF}\left(p^{m}\right)$. One of the most common fields used is the Galois field $\operatorname{GF}\left(2^{m}\right)$, because they are particularly efficient for implementation in hardware or on a binary computer. Table I shows different traditional and post-quantum cryptographic algorithms and their corresponding fields that they operate with. $\mathrm{GF}\left(2^{m}\right)$ is made up of $2^{m}$ binary polynomials, that is, polynomials with coefficients of 0 or 1 . Each of those polynomials has a degree of no more than $m-1$; thus, the elements may be expressed as $m$-bit strings. At the same place in the polynomial, each bit in the bit string corresponds to a coefficient. For instance, $\operatorname{GF}\left(2^{3}\right)$ has eight elements: $0,1, x, x+1$, $x^{2}, x^{2}+1, x^{2}+x$, and $x^{2}+x+1$, and they might be expressed as $000,001,010, \ldots, 111$, respectively.
A finite field element $A$ over $\operatorname{GF}\left(2^{m}\right)$ with PB $\left\{1, x, x^{2}, \ldots, x^{m-1}\right\}$ can be expressed as follows:

$$
\begin{equation*}
A=\sum_{i=0}^{m-1} a_{i} x^{i}, \quad a_{i} \in\{0,1\} \tag{1}
\end{equation*}
$$

where the values of $a_{i}$ are the coordinates of the input element $A$.
To perform the different finite field arithmetic operations, e.g., addition, subtraction, multiplication, inversion, and division,


Fig. 1. Finite field multiplier using the proposed error detection blocks based on cyclic codes.
an irreducible polynomial $f(x)$ of degree $m$ over $\operatorname{GF}\left(2^{m}\right)$ is needed. This irreducible polynomial limits the number of bits, since it is used to perform the modulo operation. For instance, to multiply two elements $A$ and $B$ over $\operatorname{GF}\left(2^{m}\right)$, the output $C$ is computed as follows:

$$
C=A \cdot B \bmod f(x) .
$$

There has been an increasing amount of research on finite field multiplication, since it is used not only to perform the multiplication of two finite field elements, but also to perform finite field squaring, inversion, and division. Multiplication over $\operatorname{GF}\left(2^{m}\right)$ is the most time-consuming basic arithmetic operation in many cryptographic algorithms, and its hardware implementation may need thousands of logic gates. Developing large finite field multipliers that always produce error-free outputs is a complex and expensive endeavor.

The multiplication of the elements $A$ and $B$ over $\operatorname{GF}\left(2^{m}\right)$ with PB can be further derived as follows:

$$
\begin{aligned}
C & =\sum_{i=0}^{m-1} b_{i} \cdot\left(\left(A x^{i}\right) \bmod f(x)\right) \\
& =\sum_{i=0}^{m-1} b_{i} \cdot X^{(i)}
\end{aligned}
$$

where the set of $b_{i}$ values is the $B$ coefficients, $X^{(i)}=\alpha \cdot X^{(i-1)} f(\alpha)$, and $X^{(0)}=A$. To perform such multiplication, we use the architecture from [11], needing three different modules: Sum, pass-thru, and $\alpha$ modules. The sum module is also used to perform finite field addition, using an $m$-bit XOR gate to add two $\operatorname{GF}\left(2^{m}\right)$ elements. The pass-thru module multiplies a $\operatorname{GF}\left(2^{m}\right)$ element by a $\mathrm{GF}(2)$ element. For example, let $A$ (a $\operatorname{GF}\left(2^{m}\right)$ element) and $b$ (a $\operatorname{GF}(2)$ element) serve as the inputs of the pass-thru module, while $G$ (a $\operatorname{GF}\left(2^{m}\right)$ element) serves as the output. When $b=0$, the output of the passthru module is 0 , and when $b=1$, the output is $A$. Finally, the $\alpha$ module multiplies an element of $\operatorname{GF}\left(2^{m}\right)$ by $x$, such as

$$
\begin{equation*}
A(x) \cdot x=a_{m-1} \cdot x^{m}+a_{m-2} \cdot x^{m-1}+\cdots+a_{0} \cdot x \tag{2}
\end{equation*}
$$

where

$$
x^{m} \equiv f_{m-1} \cdot x^{m-1}+f_{m-2} \cdot x^{m-2}+\cdots+f_{0} \bmod f(x)
$$

reducing the result modulo $f(x)$. For example, if we have a $\mathrm{GF}\left(2^{8}\right)$ element going through the $\alpha$ module, the input would be $A(x)=a_{7} \cdot x^{7}+a_{6} \cdot x^{6}+\cdots+a_{0}$, which gets multiplied by $x$ in the $\alpha$ module obtaining $A(x) \cdot x=a_{7} \cdot x^{8}+a_{6} \cdot x^{7}+\cdots+a_{0} \cdot x$, where
$x^{8} \equiv x^{4}+x^{3}+x+1 \bmod f(x)$ if the irreducible polynomial is $f(x)=x^{8}+x^{4}+x^{3}+x+1$. Therefore, the output of the $\alpha$ module for this specific case would be $A(x) \cdot x=a_{6} \cdot x^{7}+a_{5} \cdot x^{6}+$ $a_{4} \cdot x^{5}+\left(a_{7}+a_{3}\right) \cdot x^{4}+\left(a_{7}+a_{2}\right) \cdot x^{3}+a_{1} \cdot x^{2}+\left(a_{7}+\right.$ $\left.a_{0}\right) \cdot x+a_{7}$. Fig. 1 shows the entire finite field multiplier with the error detection blocks that are derived in Section III. The order of the error flags EF goes from top to bottom (please note that in the first iteration, there is no $\alpha$ module). Therefore, $\mathrm{EF}_{1}$ is obtained in the first $\alpha$ module, $\mathrm{EF}_{2}$ is obtained in the first sum module, $\mathrm{EF}_{3}$ is obtained in the second $\alpha$ module, $\mathrm{EF}_{4}$ is obtained in the second sum module, $\mathrm{EF}_{5}$ is obtained in the first pass-thru module, and so on. Since there are a total of $m-1 \alpha$ modules, $m-1$ sum modules, and $m$ pass-thru modules, the maximum number of error flags is $3 m-2$. ACT. SIGN. and PRED. SIGN. stand for actual and predicted signatures, respectively, and we will explain them in Section III.

## III. Proposed Fault Detection Architectures

This section presents efficient and overhead-aware error detection schemes for the different finite field multipliers found in some traditional and post-quantum cryptographic algorithms. These schemes are based on cyclic codes, one of the most important subclasses of linear codes, since they possess many algebraic properties that simplify the encoding and decoding implementations. In this work, we will use a $(7,4)$ cyclic code, meaning that for every four data/message bits $(k)$, there are three parity bits $(n-k)$ associated to form a codeword. This codeword has the following form:

$$
\text { codeword }=[\text { message }, \text { parity }] .
$$

To encode an $(n, k)$ cyclic code into its systematic form, where the $k$ leftmost digits of each code vector are the message and the $n-k$ rightmost digits are the parity-check digits, the message polynomial $u(x)$, with the form of $u(x)=u_{k-1} x^{k-1}+\cdots+u_{1} x+u_{0}$, is first multiplied by $x^{n-k}$. For example, if your message is $u(x)=x^{3}+$ $x^{2}+1$, it becomes $x^{6}+x^{5}+x^{3}$ (leaving three free bits to add the parity bits). $u(x) \cdot x^{n-k}$ is then divided by a generator polynomial $g(x)=$ $x^{n-k}+g_{n-k-1} x^{n-k-1}+\cdots+g_{2} x^{2}+g_{1} x+1$, which, in this work, is $g(x)=x^{3}+x+1$ to obtain a remainder $b(x) . b(x)$ corresponds to the parity bits, and they are added to the shifted message $u(x) \cdot x^{n-k}$ to produce a codeword in the form of $u(x) \cdot x^{n-k}+b(x)$. These steps can be accomplished by the architecture shown in Fig. 2. Let us explain next how this architecture works and how it is embedded in the finite field multiplier.

For example, we have the input message $u(x) \cdot x^{n-k}, A(x)$ for short. $A(x)$ over $\mathrm{GF}\left(2^{4}\right)$ has the form $A(x)=a_{3} x^{6}+a_{2} x^{5}+a_{1} x^{4}+a_{0} x^{3}$ (leaving three empty bits to append the parity bits). The input $A(x)$ enters 1 bit at a time (noting that the initial content of the registers is 0 ). The first bit that enters is $a_{3}$, and as shown in Fig. 2, it goes to registers $\mathrm{FF}_{1}$ and $\mathrm{FF}_{2}$, obtaining $a_{3}$ for both parity bits $P_{1}$ and $P_{2}$ and 0 for the parity bit $P_{3}$. The next bit that is fed into the circuit is $a_{2}$. Since the content of the registers is shifted to the right, $\mathrm{FF}_{1}$ now stores only $a_{2}$, and $\mathrm{FF}_{2}$ stores $a_{3}+a_{2}$, since the previous value of $\mathrm{FF}_{1}$ is XoRed with the current value of $\mathrm{FF}_{2}$, and $\mathrm{FF}_{3}$ now stores $a_{3}$, which is the previous value of $\mathrm{FF}_{2}$. The rest of the message bits are fed into the circuit in the same manner to obtain the parities shown in Table II.

To provide error detection to the finite field multipliers, each of its modules ( $\alpha$, sum, and pass-thru modules) will need to produce actual parities and predicted parities that will be compared with each other to detect any faults that have been introduced (fault analysis attack) or product of the environment (natural faults). Therefore, the circuit to encode the $(7,4)$ cyclic code needs to be placed twice on each module, as shown in Fig. 1. We note that we refer to the combination


Fig. 2. Architecture embedded in the original finite field multipliers to produce the different signatures.

TABLE II
Derivation of the Actual Signatures Using the Circuit From Fig. 2

| Input | Parity 1 | Parity 2 | Parity 3 |
| :---: | :---: | :---: | :---: |
| - | 0 | 0 | 0 |
| $a_{3}$ | $a_{3}$ | $a_{3}$ | 0 |
| $a_{2}$ | $a_{2}$ | $a_{3}+a_{2}$ | $a_{3}$ |
| $a_{1}$ | $a_{3}+a_{1}$ | $a_{2}+a_{1}$ | $a_{3}+a_{2}$ |
| $a_{0}$ | $a_{3}+a_{2}+a_{0}$ | $a_{3}+a_{1}+a_{0}$ | $a_{2}+a_{1}$ |

of Parity 1, Parity 2, and Parity 3 as signatures. The actual signatures of each module correspond to the combination of parities shown in the last row of Table II. However, the predicted signatures for the $\alpha$ module are calculated differently, using different inputs, since the $\alpha$ module multiplies $A(x)$ with $x$, as shown in (2). For example, if we have a $\operatorname{GF}\left(2^{4}\right)$ element going through the $\alpha$ module, the input would be $A(x)=a_{3} \cdot x^{3}+a_{2} \cdot x^{2}+a_{1} \cdot x+a_{0}$, which gets multiplied by $x$ to obtain $A(x) \cdot x=a_{3} \cdot x^{4}+a_{2} \cdot x^{3}+a_{1} \cdot x^{2}+a_{0} \cdot x$, where $x^{4} \equiv x^{2}+$ $1 \bmod f(x)$ if the irreducible polynomial is $f(x)=x^{4}+x^{2}+1$. Thus, the output of the $\alpha$ module is $A(x) \cdot x=a_{2} x^{3}+\left(a_{3}+a_{1}\right) x^{2}+a_{0} x+a_{3}$. This means that the first bit entering the circuit from Fig. 2 is $a_{2}$, then $\left(a_{3}+a_{1}\right)$, and so on. Table III shows the derivation of predicted signatures for the $\alpha$ module, and as it can be seen, the inputs are different, since the coefficients of $A(x) \cdot x$ are different from $A(x)$. The predicted signatures are then Xored with the actual signatures $\left(\mathrm{EF}_{1}=\right.$ ActualParity $_{1} \oplus$ PredictedParity $_{1}, \mathrm{EF}_{2}=$ ActualParity $_{2} \oplus$ PredictedParity $_{2}$, and $\mathrm{EF}_{3}=$ ActualParity $_{3} \oplus$ PredictedParity $_{3}$ ), as shown in Fig. 1, to detect if any fault has occurred. For example, if an error flag EF signals a " 1 ," the system has detected a fault in that particular module.
On the other hand, the predicted and actual signatures for the sum and the pass-thru modules are easier to derive than those for the $\alpha$ module, since the order of the coefficients does not change. For the sum module, which adds elements $A(x)$ and $B(x)$ over $\mathrm{GF}\left(2^{m}\right)$, both signatures are similar as those from Table II, but since $B(x)$ is added, its coefficients would be added as well, obtaining $p_{1}=a_{3}+b_{3}+$ $a_{2}+b_{2}, p_{2}=a_{2}+b_{2}+a_{1}+b_{1}+a_{0}+b_{0}$, and $p_{3}=a_{3}+b_{3}+$ $a_{2}+b_{2}+a_{1}+b_{1}$. Finally, for the pass-thru module, which adds element $A(x)$ over $\mathrm{GF}\left(2^{m}\right)$ with a $\mathrm{GF}(2)$ element $b$, both parities are also similar as those from Table II, but since $b$ is multiplied, the parities from Table II are multiplied as well with $b$ to obtain $p_{1}=b \cdot\left(a_{3}+a_{2}\right), p_{2}=b \cdot\left(a_{2}+a_{1}+a_{0}\right)$, and $p_{3}=b \cdot\left(a_{3}+a_{2}+a_{1}\right)$.
We note that the example shown is for $\operatorname{GF}\left(2^{4}\right)$, whose inputs have precisely 4 bits. However, if one is working with more extensive fields, the input can be split into blocks of 4 bits, and if $m$ is not divisible by 4 , e.g., $\operatorname{GF}\left(2^{13}\right)$ and $\operatorname{GF}\left(2^{29}\right), 0$ 's can be appended to fill all the blocks of 4 bits. In this work, we have used a $(7,4)$ cyclic code for simplicity of the examples, to produce more error flags per multiplication, and to not append many 0 's at the end of each last block, e.g., if the field is $\operatorname{GF}\left(2^{12}\right)$, the system does not need

TABLE III
Derivation of the Predicted Signatures Using the Circuit From Fig. 2

| Input | Parity 1 | Parity 2 | Parity 3 |
| :---: | :---: | :---: | :---: |
| - | 0 | 0 | 0 |
| $a_{2}$ | $a_{2}$ | $a_{2}$ | 0 |
| $a_{3}+a_{1}$ | $a_{3}+a_{1}$ | $a_{3}+a_{2}+a_{1}$ | $a_{2}$ |
| $a_{0}$ | $a_{2}+a_{0}$ | $a_{3}+a_{1}+a_{0}$ | $a_{3}+a_{2}+a_{1}$ |
| $a_{3}$ | $a_{2}+a_{1}$ | $a_{3}+a_{2}+a_{0}$ | $a_{3}+a_{1}+a_{0}$ |

TABLE IV
Calculation of the Error Coverage Percentage For Each MUltiplier

| Multiplier | Number of Signatures (sign.) | Error coverage \% |
| :---: | :---: | :---: |
| $G F\left(2^{8}\right)$ | $7_{\alpha}+7_{\text {sum }}+8_{\text {pass }}$ | $100 \cdot\left(1-\left(\frac{1}{2}\right)^{22}\right)$ |
| $G F\left(2^{13}\right)$ | $12_{\alpha}+12_{\text {sum }}+13_{\text {pass }}$ | $100 \cdot\left(1-\left(\frac{1}{2}\right)^{37}\right)$ |
| $G F\left(2^{16}\right)$ | $15_{\alpha}+15_{\text {sum }}+16_{\text {pass }}$ | $100 \cdot\left(1-\left(\frac{1}{2}\right)^{46}\right)$ |
| $G F\left(2^{29}\right)$ | $28_{\alpha}+28_{\text {sum }}+29_{\text {pass }}$ | $100 \cdot\left(1-\left(\frac{1}{2}\right)^{85}\right)$ |
| General | $m-1_{\alpha}+m-1_{\text {sum }}+m_{\text {pass }}$ | $100 \cdot\left(1-\left(\frac{1}{2}\right)^{\text {sign. }}\right)$ |

to append any " 0 " using a $(7,4)$ cyclic code, but it needs to append " 0000 " to each last block if using a $(12,8)$ cyclic code. In addition, in the previous example, using a $(7,4)$ cyclic code obtains nine parity bits per operation, while the $(12,8)$ obtains eight. Employing smaller cyclic codes provides more parity bits; however, they can use more computational resources than bigger cyclic codes, especially for larger fields. The choice of the utilized cyclic codes can be tailored based on the reliability requirements and the overhead to be tolerated.

## IV. Error Coverage and FPGA Implementations

As mentioned earlier, the proposed error detection schemes are intended to detect natural faults product of the environment and intentional faults, e.g., fault analysis attacks. Fault analysis attacks are active attacks where the adversary seeks to interfere with the cryptographic process handling sensitive data, causing inaccurate outputs, which can reveal sensitive data. Bit-fault injection at the desired place and at the desired cycle would be the perfect attack to carry out in order to gather a wealth of data. However, this is generally not practical, as costly equipment is required due to the shrinking geometry size of integrated chips. Our fault model covers single as well as multiple stuck-at faults (stuck-at 0 and stuck-at 1 ), since technological limitations may make it difficult for an attacker to flip precisely 1 bit. In this work, we assume that the comparators are hardened, i.e., the comparators are fault free and not compromised, and that the inputs are not compromised prior to the execution of the multiplier units.
The total number of signatures needs to be determined to calculate the error coverage provided by the various fault detection strategies proposed in this work. The formula used to calculate the error coverage is $100 \cdot\left(1-(1 / 2)^{s}\right) \%$, where $s$ is the total amount of signatures. The percentage of not detecting errors using the signatures is independent, and thus, this is the percent of detecting errors for randomly distributed faults. Each multiplier needs a total of $m-1$ $\alpha$ modules, $m-1$ sum modules, and $m$ pass-thru modules, and each of them has a signature (as mentioned earlier, we note that the term signature through this brief refers to the combination of the parities, and one actual signature with one predicted signature is equivalent to just one signature in the previous formula). Therefore, each cryptosystem has a different number of signatures and a different error detection coverage, as it is shown in Table IV.
Normal parity and multibit parity schemes are usually fast and do not require too many extra resources to implement. However, normal parity has only an error coverage of up to $50 \%$ (it can only detect an

TABLE V
Overheads Obtained When Implementing the Proposed Error Detection Schemes on Top of the Original Architectures Using AMD/XILINX FPGA FAMILY ARTIX-7 DEVICE XC7A12TCPG238-3

| Architecture | Area <br> (occupied slices) | Delay (ns) | Power (mW) <br> $@ 50 \mathrm{MHz}$ | Throughput <br> $(\mathrm{Gbps})$ |
| :---: | :---: | :---: | :---: | :---: |
| $G F\left(2^{8}\right)$ Multiplier | 35 | 2.825 | 0.064 | 2.83 |
| $G F\left(2^{8}\right)$ Multiplier with Error Detection | $35(\sim 0 \%)$ | $3.245(14.87 \%)$ | $0.064(\sim 0 \%)$ | $2.47(-12.72 \%)$ |
| $G F\left(2^{13}\right)$ Multiplier | 91 | 3.669 | 0.067 | 3.54 |
| $G F\left(2^{13}\right)$ Multiplier with Error Detection | $97(6.59 \%)$ | $3.763(2.56 \%)$ | $0.067(\sim 0 \%)$ | $3.45(-2.54 \%)$ |
| $G F\left(2^{16}\right)$ Multiplier | 131 | 3.561 | 0.069 | 4.49 |
| $G F\left(2^{16}\right)$ Multiplier with Error Detection | $137(4.58 \%)$ | $3.934(10.47 \%)$ | $0.069(\sim 0 \%)$ | $4.07(-9.35 \%)$ |
| $G F\left(2^{29}\right)$ Multiplier | 402 | 5.088 | 0.078 | 5.70 |
| $G F\left(2^{29}\right)$ Multiplier with Error Detection | $529(31.59 \%)$ | $6.412(26.02 \%)$ | $0.079(1.28 \%)$ | $4.52(-20.70 \%)$ |

odd number of faults), and multibit parity is vulnerable to intelligent fault injection. While normal parity only produces a single parity bit, multibit parity schemes group several bits obtaining multiple parity bits. For example, if your system has $\operatorname{GF}\left(2^{10}\right)$ elements, you could have five 2-bit groups, giving five error flags or parity bits. However, faults usually occur in clumps, since it is very costly to inject a single fault, and multiparity bit schemes would not detect some clumps of injected faults, e.g., four consecutive faults. Cyclic codes outperform normal and multibit parities by increasing the complexity of the arithmetic operations employed. As it can be observed in Table III, the only combination of injected faults that the system would not detect faults in the $\alpha$ module is when they are injected at $a_{3}$ and $a_{1}$, since both are located in parity bits 2 and 3 but not in parity bit 1 . Note that producing two single faults in the same cycle is extremely challenging and costly. Using the $(7,4)$ cyclic code presented in this work, there are four different places where single fault can be injected, six different combinations where two faults can be injected, four different combinations where triple faults can be injected, and only one combination where four faults can be injected, which makes a total of 15 combinations. Out of those 15 cases, our schemes would detect faults in 14 of them, or in other words, in $93.33 \%$ of the cases. For the sum and pass-thru modules, the error coverage is very similar. For the sum module, since the coefficients of the $\mathrm{GF}\left(2^{m}\right)$ element $B(x)$ are added to the parities, the only case where the scheme does not detect faults is if there is a double fault at $b_{3}$ and $b_{1}$ or at $a_{3}$ and $a_{1}$ ( 253 out of 255 cases would be covered: $99.2 \%$ error coverage). Finally, for the pass-thru module, since $b$ is a GF(2) element and gets multiplied on each parity bit, the only case where the scheme would not detect faults is if there is a double fault at $a_{3}$ and $a_{1}$ ( 30 out of 31 cases would be covered: $96.77 \%$ error coverage).

We performed a total of $2^{16}$ fault simulations on FPGA for $\operatorname{GF}\left(2^{8}\right)$ multipliers using normal parity schemes and cyclic codes. Of those simulations, $2^{14}$ were with single fault injected, $2^{14}$ were with double faults injected, $2^{14}$ were with three faults injected, and $2^{14}$ were with four faults injected. The $\operatorname{GF}\left(2^{8}\right)$ multiplier with normal parity had an error coverage percentage of $50 \%$, since it detected all odd amounts of faults but not any even amounts of faults. On the other hand, the $\mathrm{GF}\left(2^{8}\right)$ multiplier with a $(7,4)$ cyclic code detected $>99.9 \%$ of the injected faults. More information regarding the implementation results of these two schemes can be found at the end of this section.

In addition, we have implemented the proposed error detection schemes on the AMD/Xilinx FPGA family Artix-7 device xc7a12tcpg238-3 to benchmark the different overheads obtained when cyclic codes are applied to the original architectures. We use the Vivado tool and Verilog as the hardware design entry to analyze the area (occupied slices), delay (ns), power (mW) with the clock
frequency of 50 MHz , and throughput ( $\mathrm{Gb} / \mathrm{s}$ ). The different overheads obtained are shown in Table V, and as one can see, the schemes added a worst case area overhead of $31.59 \%$ and a worst case delay overhead of $26.02 \%$ when they were added to the WG-29. Regarding the power, the overheads obtained when we apply the error detections schemes are negligible due to the size of the designs, obtaining a worst case power overhead of $1.28 \%$ when cyclic codes were applied to the original $\mathrm{GF}\left(2^{29}\right)$ multiplier.

To the best of our knowledge, there has not been any prior work done on error detection based on general cyclic codes for finite field multipliers with PB elements. Let us review several case studies on error detection in $\operatorname{GF}\left(2^{m}\right)$ arithmetic hardware for qualitative comparison to ensure that the overheads incurred are reasonable. Fault detection techniques for the key generator of code-based postquantum cryptosystems on FPGA are implemented in [12], obtaining between $\approx 5 \%$ (best case) and $\approx 49 \%$ (worst case) area overheads. In [10], fault detection capability on binary extension fields based on parity is proposed for elliptic curve cryptography (ECC), obtaining a little over $10 \%$ area overhead. In [13], cyclic redundancy checks (CRC), which are a type of cyclic codes, are proposed for the inverse of an element with PB in $\mathrm{GF}\left(2^{163}\right)$, yielding an area overhead of around $26 \%$. In addition, CRC schemes are implemented on the key generator of the Niederreiter cryptosystem in [14], obtaining the efficiency degradations of at most $7.72 \%$. For the sake of comparison, we have also compared our schemes with a $\operatorname{GF}\left(2^{8}\right)$ finite field multiplier using error detection based on normal parity. The latter obtained worse results in terms of area, with an overhead of $2.86 \%$ ( 36 occupied slices), but a lower delay overhead of $13.55 \%$ ( 3.208 ns ). It is also worth mentioning that normal parity error detection does not detect an even amount of faults. These and other comparable research on error detection demonstrate the acceptability of the overheads obtained in this work.

## V. Conclusion

This brief presents error detection techniques for finite field multipliers with PB elements based on cyclic codes. Moreover, we have also added the suggested fault detection schemes to the original finite field multipliers of algorithms, such as AES, WG, and the McEliece cryptosystem in the AMD/Xilinx FPGA family Artix7 device xc7a12tcpg238-3. In addition, we calculate the different overheads added by such schemes and compare them with related works to prove that the overheads from the proposed schemes are acceptable. The schemes added a worst case area overhead of $31.59 \%$ and a worst case delay overhead of $26.02 \%$ when the schemes were added to the WG-29 cryptosystem, which is suitable due to the high error coverage achieved of close to $100 \%$.

## REFERENCES

[1] A. C. Canto, M. Mozaffari-Kermani, and R. Azarderakhsh, "Reliable CRC-based error detection constructions for finite field multipliers with applications in cryptography," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 29, no. 1, pp. 232-236, Jan. 2021.
[2] D. Heinz and T. Poppelmann, "Combined fault and DPA protection for lattice-based cryptography," IEEE Trans. Comput., early access, Aug. 8, 2022, doi: 10.1109/TC.2022.3197073.
[3] V. Arribas, F. Wegener, A. Moradi, and S. Nikova, "Cryptographic fault diagnosis using VerFI," in Proc. IEEE Int. Symp. Hardw. Oriented Secur. Trust (HOST), Dec. 2020, pp. 229-240.
[4] J. Kaur, M. Mozaffari-Kermani, and R. Azarderakhsh, "Hardware constructions for error detection in lightweight authenticated cipher ASCON benchmarked on FPGA," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 69, no. 4, pp. 2276-2280, Apr. 2022.
[5] M. Mozaffari-Kermani and A. Reyhani-Masoleh, "Reliable hardware architectures for the third-round SHA-3 finalist grostl benchmarked on FPGA platform," in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Nanotechnol. Syst., Oct. 2011, pp. 325-331.
[6] M. Mozaffari-Kermani and A. Reyhani-Masoleh, "A high-performance fault diagnosis approach for the AES SubBytes utilizing mixed bases," in Proc. Workshop Fault Diagnosis Tolerance Cryptography, Sep. 2011, pp. 80-87.
[7] C.-Y. Lee, P. K. Meher, and J. C. Patra, "Concurrent error detection in bit-serial normal basis multiplication over $\operatorname{GF}\left(2^{m}\right)$ using multiple parity prediction schemes," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 8, pp. 1234-1238, Aug. 2010.
[8] A. Cintas-Canto, M. M. Kermani, and R. Azarderakhsh, "Reliable architectures for composite-field-oriented constructions of McEliece post-quantum cryptography on FPGA," IEEE Trans. Comput.Aided Design Integr. Circuits Syst., vol. 40, no. 5, pp. 999-1003, May 2021.
[9] A. Reyhani-Masoleh and M. A. Hasan, "Fault detection architectures for field multiplication using polynomial bases," IEEE Trans. Comput., vol. 55, no. 9, pp. 1089-1103, Sep. 2006.
[10] C. Fei, F. Zhou, N. Wu, F. Ge, J. Wen, and P. Qin, "A scalable bit-parallel word-serial multiplier with fault detection on $\operatorname{GF}\left(2^{m}\right)$," in Proc. IEEE 20th Int. Conf. Commun. Technol. (ICCT), Oct. 2020, pp. 1660-1664.
[11] A. Reyhani-Masoleh and M. A. Hasan, "Error detection in polynomial basis multipliers over binary extension fields," in Proc. CHES, 2002, pp. 515-528.
[12] A. C. Canto, M. M. Kermani, and R. Azarderakhsh, "Reliable constructions for the key generator of code-based post-quantum cryptosystems on FPGA," ACM J. Emerg. Technol. Comput. Syst., Jun. 2022, pp. 1-10.
[13] A. Cintas-Canto, M. Mozaffari-Kermani, and R. Azarderakhsh, "CRCbased error detection constructions for FLT and ITA finite field inversions over $G F\left(2^{m}\right)$," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 29, no. 5, pp. 1033-1037, May 2021.
[14] A. Cintas-Canto, M. Mozaffari-Kermani, R. Azarderakhsh, and K. Gaj, "CRC-oriented error detection architectures of postquantum cryptography niederreiter key generator on FPGA," in Proc. IEEE Nordic Circuits Syst. Conf. (NorCAS), Oct. 2022, pp. 1-7.


[^0]:    Manuscript received 16 August 2022; revised 2 November 2022; accepted 20 November 2022. Date of publication 30 November 2022; date of current version 28 December 2022. This work was supported in part by the Marymount University through the START under Grant 2450100 and in part by the U.S. National Science Foundation (NSF) under Award SaTC-1801488. (Corresponding author: Mehran Mozaffari Kermani.)

    Alvaro Cintas-Canto is with the School of Technology and Innovation, Marymount University, Virginia, VA 22207 USA (e-mail: acintas@ marymount.edu).
    Mehran Mozaffari Kermani is with the Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620 USA (e-mail: mehran2@usf.edu).
    Reza Azarderakhsh is with the Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431 USA (e-mail: razarderakhsh@fau.edu).
    Digital Object Identifier 10.1109/TVLSI.2022.3224357

