

Institut für Technische Informatik

The **BRISKI** RISC-V Barrel processor for ASIC and **FPGA** Implementation - one Core to rule them both

**Novel Computing Technologies** Dr. -Ing Riadh Ben Abdelhamid and Prof. Dr.-Ing. Dirk Koch



- A highly-optimized, fast and compact RISC-V Barrel Processor core:
  - Area : Typically < 800 LUTs
  - Speed : fast operation @650+ MHz
  - **CPI : =1** (with **16 harts** and 10-stage pipeline)  $\rightarrow$  650+ MIPS
    - $\rightarrow$  > 0.81 MIPS/LUT

• ISA : Full RV32I user-mode, CSRRS and LR.W/SC.W

(atomic instructions for synchronization primitives)

**Memory:** 4 KB/core shared I-RAM and D-RAM

(+ multiple cores can share URAM)

• **Publicly available** under:

https://github.com/riadhbenabdelhamid/BRISKI







## **A Barrel Processor approach to RISC-V Implementation**

- **10-stage deep physical pipeline** with no need for neither branch prediction nor register forwarding.
- PC storage: 16 Program Counters are stored using distributed memory (LUTRAM).
- **RegisterFile storage:**

## **BRISKI** CoreTop wrapper interface

- A typical and minimal Interface for communicating  $\bullet$ with the outside world or for integration in a manycore design. It contains:
  - **Memory Mapped Interface:** Translates Loads/Stores to/from signals.
  - **Activation Multiplexers:** Disable the outputs of ulletthe core interface when the input grant signal is not set (e.g: by an arbiter).
  - **Memory Map Decoder:** Selects/enables desired memories/interfaces.

AREA AND PERFORMANCE ATTRIBUTES OF BRISKI ON A VU9P FPGA

- One BRAM tile (Two RAMB18) to store **16 different RegisterFiles** (Full Capacity).  $\bullet$
- Access to a specific RegisterFile requires concatenating the related Thread Index.  $\bullet$



|                                        | with bshift       | no bshift                |
|----------------------------------------|-------------------|--------------------------|
| Year                                   | 2024              |                          |
| FPGA                                   | VU9P              |                          |
| LUT                                    | 789               | 525                      |
| FlipFlop (FF)                          | 855               | 801                      |
| BRAM                                   | 2 (RAMB18)        | 2 (RAMB18)               |
| ISA                                    | RV32I+lr/sc+csrrs | RV32I+lr/sc+csrrs-bshift |
| Fmax(MHz)                              | 650               | 675                      |
| CPI                                    | 1                 | 1                        |
| MIPS=(Fmax/CPI)                        | 650               | 675                      |
| <b>Compute Density</b><br>(MIPS / LUT) | 0.82              | 1.28                     |

**SPARKLE**: A Kilo-core design based on **BRISKI** 

- 1,024 BRISKI Cores
- **16,384** Hardware Threads



The BRISKI RISC-V **Barrel-Processor** on ASIC

- Tiny, densely packed in a multi-project die and spreads in about one fourth of a 0.6 mm by 0.9 mm slot area, using Skywater130 nm PDK.
- Clocking at **40 MHz** (which is 80% of the max clock speed allowed by the relatively slow Skywater 130 nm PDK.

- < 800K LUTs on VU9P
- **100%** DSPs usable for accelerators
- **Fast** operation and high throughput :

## @400 MHz $\rightarrow$ 400 GIPS

- High Compute Density: ~0.5 MIPS/LUT
- Includes a PCIe host interface

