

### A Hardware Design Methodology to Prevent Microarchitectural Transition Leakages

April 3<sup>rd</sup> 2025 Mathieu Escouteloup<sup>1</sup>, Vincent Migliore<sup>2</sup> 1. Bordeaux University, IMS Lab, France 2. Toulouse University, LAAS-CNRS, France







Cyberphysical and connected objets have been widely adopted to address some important societal challenges. They are the subject of two trends:

- Increasing complexity: typically following System On Chip (SoC) architectures.
- Increasing attack surface: with hardware-oriented attacks.



- Side-channel attacks (SCA) are gaining momentum since they are an effective way to break cryptography (along with fault injection).
- This presentation focus on power-based side-channel attacks, i.e. extracting secrets from energy consumption or electromagnetic field.

### Side-Channel Leakage Sources





Several methods and tools have been proposed to verify security and harden software or hardware against SCA, with possible interactions, to increase the security level of a cryptographic implementation.



Masking: Method to allow side-channel observation without leaking secret, based on a formal model of the attacker (*t*-probing for instance). Challenge: Implement masking with minimal security loss.









### Datapath Description and Hardening Strategy



Microarchitectural Hardening (mutually exclusive)

- S<sub>1</sub> Complete Stage Overwriting Delayed Data Multiplexing
- S<sub>2</sub> Microarchitecture Register Duplication
- 2 Architectural Hardening
  - S<sub>3</sub> Architectural Register Pre-Writing Decoupled Memory Operations

# Strategy S1: Complete Stage Overwriting



### Objective

 $\begin{array}{ll} \mbox{Full pipeline cleaning (including combinatorial logic):} \\ \mbox{Before:} & S_0 \rightarrow S_1 \rightarrow S_2 \\ \mbox{After:} & S_0 \rightarrow 0 {\rm x0} \rightarrow S_1 \rightarrow 0 {\rm x0} \rightarrow S_2 \end{array}$ 

Remark: Only 1 buffer is represented, but all data dependent buffers in a stage are patched with 0x0 insertion synchronized.



cycle Register's value



Propagation of  $S_0$  bits (+ all outputs from other stage buffers) through the combinatorial logic.



Stage buffers and all combinatorial logic cleaning (i.e. set to 0).

- \* ロ \* \* @ \* \* ヨ \* \* ヨ \* の < ?



Propagation of  $S_1$  bits (+ all outputs from other stage buffers) through the combinatorial logic.



Stage buffers and all combinatorial logic cleaning (i.e. set to 0).

- ◆ □ ▶ ◆ 圖 ▶ ◆ 圖 ▶ ◆ 圖 → � ○ � ○ ○ ○



Propagation of  $S_2$  bits (+ all outputs from other stage buffers) through the combinatorial logic.



Stage buffers and all combinatorial logic cleaning (i.e. set to 0).

・ ロ・・ 個・・ ヨ・・ ヨ・ つんの



Propagation of  $S_3$  bits (+ all outputs from other stage buffers) through the combinatorial logic.

### Strategy S1: Complete Stage Overwriting



### Security Property

For a given pipeline stage and all possible instruction's sequence manipulating data stored in the Register File, no transition leakage can occur in the **stage** (buffer + logic behind) *by design*.

Remark: Since ID/EX buffer must be patched, there is no reason to change strategy for following stages because the 2 cycles penalty will be propagated to the other stages.

□▶∢@▶∢≧⊁∢≧⊁, ≧, 🧉

#### Strategy S1: *Delayed Data Multiplexing* Standard Pipeline Issue



Data can come from different sources

 $\rightarrow$  need multiplexing with 0x0 insertion guarantees.

Targeted mechanisms: register forwarding, load resize buffer, skid buffer (memory controller).

#### Strategy S1: *Delayed Data Multiplexing* Standard Pipeline Issue



- Definition of a *Slot* Finite State Machine to alternate between inputs.
- Ready not displayed but required to delay inputs.

Hardware implementation: input selection using a slct  $log_2(N_{inputs})$ -bit register, force zero using a zero 1-bit register after each processed operation.

◆□▶◆@▶◆≧▶◆≧▶ ≧ 少へ0

# Strategy S2: *µarch. Register Duplication*



#### Objective

#### Stage buffer cleaning:

| Before: |                    | $S_0$ | $\rightarrow$ | $S_1$ | $\rightarrow$ | $S_2$ | $\rightarrow$ | $S_3$ |  |
|---------|--------------------|-------|---------------|-------|---------------|-------|---------------|-------|--|
| After:  | Rabove             | $S_0$ | $\rightarrow$ | 0x0   | $\rightarrow$ | $S_2$ | $\rightarrow$ | 0x0   |  |
|         | R <sub>below</sub> | 0x0   | $\rightarrow$ | $S_1$ | $\rightarrow$ | 0x0   | $\rightarrow$ | $S_3$ |  |
|         | output             | $S_0$ | $\rightarrow$ | $S_1$ | $\rightarrow$ | $S_2$ | $\rightarrow$ | $S_3$ |  |

◆□▶◆□▶◆臣▶◆臣▶ 臣 のへで

### Strategy S2: *µarch. Register Duplication* Stage Evolution Cycle By Cycle







Propagation of  $S_0$  bits (+ all outputs from other stage buffers) through the combinatorial logic.

・ロト・回・・ヨト・ヨト ヨー つくの



Propagation of  $S_1$  bits (+ all outputs from other stage buffers) through the combinatorial logic.

▲□▶▲圖▶▲≣▶▲≣▶ ≣ のQの



Propagation of  $S_2$  bits (+ all outputs from other stage buffers) through the combinatorial logic.

・ロト・回・・ヨト・ヨト ヨー つくの



Propagation of  $S_3$  bits (+ all outputs from other stage buffers) through the combinatorial logic.

・ロト・回ト・ヨト・ヨト ヨー つんの



Propagation of  $S_4$  bits (+ all outputs from other stage buffers) through the combinatorial logic.

・ロト・四ト・回ト・回・ つくの



Propagation of  $S_5$  bits (+ all outputs from other stage buffers) through the combinatorial logic.

◆□▶◆□▶◆豆▶◆豆▶ 豆 のへで



Propagation of  $S_6$  bits (+ all outputs from other stage buffers) through the combinatorial logic.

◆□▶▲□▶▲≣▶▲≣▶ ≣ のへの

### Strategy S2: *µarch. Register Duplication*



#### **Security Property**

For a given pipeline stage and all possible instruction's sequence manipulating data stored in the Register File, no transition leakage can occur in the **buffer** by design.

#### Strategy S3.a: Arch. Register Pre-Writing General Purpose Register File Hardening



Same strategy than  $S_1$  Complete Stage Overwriting, but the Write Back allow anticipate which register must be cleaned.

### Strategy S3.b: *Decoupled Memory Operations*



**Trash**: a reserved memory location to perform hardware hardening operations.

**Read**: after each valid read, a second read is performed from the trash to clean the rdata signal.

Write: before each write, a previous operation overwrite wdata and the memory value / after, a write is performed to the trash to clean the wdata signal.



- 1 Target Architecture: RISCV32IM\_Zicsr
- 2 Specifications in CHISEL
- 3 Validation with microbenchmarks:
  - In simulation with Verilator
  - In real target with chipwhisperer

| Conf. | Stages | S1.a         | S1.b         | S2           | S3.a         | S3.b         |
|-------|--------|--------------|--------------|--------------|--------------|--------------|
| C5U   | 5      |              |              |              |              |              |
| C5S1  | 5      | $\checkmark$ | $\checkmark$ |              | $\checkmark$ | $\checkmark$ |
| C5S2  | 5      |              |              | $\checkmark$ | $\checkmark$ | *            |
| C7U   | 7      |              |              |              |              |              |
| C7S1  | 7      | $\checkmark$ | $\checkmark$ |              | $\checkmark$ | $\checkmark$ |
| C7S2  | 7      |              |              | $\checkmark$ | $\checkmark$ | *            |
|       |        |              |              |              |              |              |

★ : Partial support, trash operations not needed to clean signals.

◆□▶▲□▶▲≡▶▲≡▶ ■ 少へ⊙

#### LARS Implementation Results Security with Correlation Power Analysis

C5U

COU: aeg sw sw 0

0.05

0.04

0.03

0.02

0.01

0.00

0.05

0.04

£ 0.03

Š 0.07

0.01

0.00



C5S1







Above Arithmetical and logical operations sequence. Below Store operations sequence. Configuration: *50,000 traces* 

0.002

0.000

◆□▶◆@▶◆≧▶◆≧▶ ≧ 少へ0



| Embench | LUT                                            | FF                                                        |  |
|---------|------------------------------------------------|-----------------------------------------------------------|--|
| 1       | 1                                              | 1                                                         |  |
| 1.8635  | 1.0533                                         | 1.0088                                                    |  |
| 1.0266  | 1.3746                                         | 1.2418                                                    |  |
| 1       | 1                                              | 1                                                         |  |
| 1.4821  | 1.1116                                         | 1.0215                                                    |  |
| 1.0002  | 1.2751                                         | 1.3343                                                    |  |
|         | 1<br>1.8635<br>1.0266<br>1<br>1.4821<br>1.0002 | 1 1   1.8635 1.0533   1.0266 1.3746   1 1   1.4821 1.1116 |  |

Ratio with unprotected cores (CU5 and CU7).

.



- Conclusion
- Addressing transition leakage from its root cause is possible (and implementable).
- 2 Design choices between security, area and cycles overhead are required (with possible software countermeasures).
- Glitches may still occur here since it's not a µarchitectural issue (circuit layer), and are not clearly identified after removing other leakage.

.



Other layers consideration:

- Application Impact of the different hardware strategies?
  - ISA Possibility to enable/disable a strategy only when needed
  - **Physical** Impact of synthesis / place-and-route steps?

.



## Thanks for your attention. Don't hesitate to ask your questions.