# Combined Masking and Shuffling for Side-Channel Secure Ascon on RISC-V

Linus Mainka, Kostas Papagiannopoulos

03-04-2025



◆□ ▶ ◆□ ▶ ◆ □ ▶ ◆ □ ▶ ● □ ● ● ● ●

- Selected by the U.S. NIST as the standard for lightweight cryptography (LWC)
- Extensively evaluated and found to be mathematically secure
- ▶ However: Side-Channel attacks are possible [MBA<sup>+</sup>23, WP23]
  - Full key recovery through Correlation Power Analysis (CPA) with 8,000 traces

- CPA using deep learning techniques: 1,000 traces
- Partial key recovery from first-order masked implementation



# How can we create a software implementation of Ascon on a 32-bit architecture that should be side-channel secure by a comfortable margin?



How can we create a software implementation of Ascon on a 32-bit architecture that should be side-channel secure by a comfortable margin?

- Still retain security even if masking order is halved [BGG<sup>+</sup>14]
- Do not rely on a single countermeasure only
- Investigate multiple approaches
- Analyse security benefit through the Mutual Information (MI) framework [SMY06] using shortcut formulas [ABG<sup>+</sup>22]



### Bitslice Masking and Improved Shuffling: How and When to Mix Them in Software? by Azouaoui et al.:

- Mask first, then shuffle
- Three theoretical approaches for combining masking and shuffling

▲□▶ ▲□▶ ▲ □▶ ▲ □▶ ▲ □ ● ● ● ●

- Shuffle Tuples
- Shuffle Shares
- Shuffle Everything

- ► Non-linear (AND) operations: ISW
- ▶ Provide Mutual Information (MI) shortcut formulas

### Setting Previous Work

#### **Our contribution:**

- Mask first, then shuffle
- Five Ascon implementations on 32-bit RISC-V for combining masking and shuffling

▲□▶ ▲□▶ ▲□▶ ▲□▶ □ の00

- Shuffle Tuples
- Shuffle Shares
- Shuffle Everything "Light"
- Unshuffled, but masked implementation
- Levelled implementation
- Non-linear (AND) operations: PINI [CS20]
- ▶ Use Mutual Information (MI) shortcut formulas
- Third-order masking

# Background Ascon



▶ Split a value x into d + 1 shares  $x_0, \ldots x_d \rightarrow$  "d-th order masking"

$$\blacktriangleright x_0, \ldots, x_{d-1}$$
 are random values  $r_0, \ldots, r_{d-1}$ 

$$x_d = x \oplus r_0 \oplus \cdots \oplus x_{d-1}$$

- Perform each operation on all shares of x
- To obtain the original value, we can recombine all shares

- ▶ Take a sequence of independent operations  $[x_0 \circ y_0, \ldots, x_n \circ y_n]$
- > Randomise the order in which they are executed according to a permutation  $\theta$

• At step *i*, perform the operation  $x_{\theta_i} \circ y_{\theta_i}$ 

# Background

#### Bit Interleaving



◆□ ▶ ◆□ ▶ ◆ □ ▶ ◆ □ ▶ ● □ ● ● ● ●

# Background

#### Interleaving



◆□ ▶ ◆□ ▶ ◆ □ ▶ ◆ □ ▶ ● □ ● ● ● ●

# Background

#### Interleaving



◆□▶ ◆□▶ ◆三▶ ◆三▶ ○○ ○○ ○○

# Shuffle Tuples

- Ignore Masking when shuffling
- Still shuffle entire operations





# Shuffle Shares

- Instead of shuffling across operations, we shuffle across shares
- We do not shuffle across shares of the same value
- We shuffle across shares with the same index of different values



# Shuffle Everything "Light"

- Adaptation of previous scheme
- Utilising the structure of bit interleaving



◆□▶ ◆□▶ ◆三▶ ◆三▶ ◆□▼

# **PINI AND**

Algorithm 1 PINI AND gadget with linear memory requirements

**Inputs:**  $a = [a_0, \ldots, a_d], b = [b_0, \ldots, b_d]$ for i = 0 to d do  $c_i \leftarrow a_i b_i$ end for for i = 0 to d do for i = i + 1 to d do  $r_{ii} \stackrel{\$}{\leftarrow} \mathbb{F}_{2^{32}}; r_{jj} \leftarrow r_{jj}$  $z_{ii} = (a_i + 1) \cdot r_{ii} + a_i \cdot (b_i + r_{ii})$  $z_{ii} = (a_i + 1) \cdot r_{ii} + a_i \cdot (b_i + r_{ii})$  $c_i \leftarrow c_i + z_{ii}$  $c_i \leftarrow c_i + z_{ii}$ end for end for **Outputs:**  $c = [c_0, \ldots, c_d]$  so that  $c = a \wedge b$ 

# Results

#### Performance



# d = 3 for all masked schemes

### Results Mutual Information





Results Cycles vs. MI

|              | Unshuffled | Shuffle Tuples | Shuffle Shares | Shuffle EL |
|--------------|------------|----------------|----------------|------------|
| <i>d</i> = 3 | 0.0752     | 0.1125         | 0.376          | 0.7521     |

Table: The  $\rm MI$  values per scheme so that an adversary needs  $10^6$  traces.

| Unshuffled | Shuffle Tuples | Shuffle Shares | Shuffle EL |
|------------|----------------|----------------|------------|
| 2,714      | 3, 456         | 8,371          | 8,563      |

Table: The number of cycles needed to compute one round of the permutation.

## Results

Traces vs. Masking Order



◆□ → ◆□ → ◆三 → ◆三 → ● ● ● ● ●

### Conclusion

#### Takeaways

Shuffle Shares and Shuffle EL are better than just increasing d (Assuming no shuffling permutation leakage)

Benefit of Shuffle EL increases as register size goes down

Implementation is Ascon-specific, the schemes are not

# Conclusion

#### Takeaways

Shuffle Shares and Shuffle EL are better than just increasing d (Assuming no shuffling permutation leakage)

Benefit of Shuffle EL increases as register size goes down

Implementation is Ascon-specific, the schemes are not

### Caveats

- (Micro-)architectural effects will likely reduce the practical security
- Requires significant randomness
- No physical evaluation

Code:

https://uva-hva.gitlab.host/l.mainka/side-channel-secure-ascon

# Conclusion

#### Takeaways

Shuffle Shares and Shuffle EL are better than just increasing d (Assuming no shuffling permutation leakage)

- Benefit of Shuffle EL increases as register size goes down
- Implementation is Ascon-specific, the schemes are not

### **Caveats** Future Work

- (Micro-)architectural effects will likely reduce the practical security
- Requires significant randomness
- No physical evaluation

Code:

https://uva-hva.gitlab.host/l.mainka/side-channel-secure-ascon

# References I



Lightweight but not easy: Side-channel analysis of the ascon authenticated cipher on a 32-bit microcontroller. Cryptology ePrint Archive, Paper 2023/1598, 2023.

・ロト・日本・日本・日本・日本・日本・○○への