# Sovereignty – Independence – Innovation 7 years of HW/SW codesign with RISC-V at CEA Thomas Dombek — <u>thomas.dombek@cea.fr</u> Head of CEA's LIST/DSCIN Digital Systems and Integrated Circuits division Thursday May 13th, 2025, RISC-V Summit Europe, Paris ### Agenda - 1. Design activities at CEA - 2. RISC-V, the obvious choice - 3. RISC-V related achievements - 4. Perspectives ### Design activities at CEA From ultra-low-power (ULP) to highperformance computing (HPC) ### CEA in a glimpse 21 000 employees 6 Billions€ of budget **700** industrial partners 650 patents/year Government & academic research 1<sup>st</sup> global 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2022, 2023, 2024, 2025 Smart digital systems Micro & nanotechnologies FUNDAMENTAL RESEARCH ### CEA in a glimpse >2500 Staff members 11,700 sq. m of cleanroom space 100-200-300 mm wafers - >350 Industrial partners - >600 Publications per year - >3050 Patents in portfolio - >75 startups created ### CEA in a glimpse ### Lab-to-Fab Integrated Services from CEA: https://www.youtube.com/watch?v=B6ygesaVMm4 - Expertise in designing state-of-the-art hardware architectures, systems-on-chip, ASICS, and chiplets. - Efficient development of reliable, secure, and lowpower solutions tailored to your needs. - Utilization of advanced design exploration tools and state-of-the-art design flows to turn your idea into a ready-to-manufacture circuit. - Access to our cleanrooms for prototype manufacturing with extensive testing and verification at every stage of the design process. ### 10+ years of experience in Chip design MAG-3D 3D Network-on-Chip **HUBEO Photonic NoC interposer** **INTACT** 6 chiplets & 96 processors **CRYOCMOS Control for quantum** computing **EPAC HPC Variable Precision Accelerator** **STARAC Chiplet-based Optical Network on Chip** ### TRUST-WORTHY LOCOMOTIV Adaptive Voltage & **Frequency Scaling** **FRISBEE ULP FDSOI** RETINE **Ultra-fast smart** imager Cyber-VT WARRIOR RISC-V IoT IC **Test Vehicle for IoT** with wake-up security enhancement Non-Volatile-Memory **NVM** subsystem for Microcontrolers VASCO 2 ASIC vehicle for component security **REPTILE Analogue** neuron **SPIDER** Neuromorphic **DSP** Samura a mare **SPIRIT** Spiking NN with eNVM **SAMURAI** IoT IC with NN accelerator In-Memory-Computing **Compute-SRAM** **ESPERANTO** RNN with 50k synapses NeuroCorgi Ultra low power AI 2011 ### INTACT - heterogeneity, modularity and reuse of 3D-Design Making 3D design heterogeneity, modularity and reuse real With further cost, TMM and yield improvements Proven with our 96-cores compute demonstrator: 6 chiplets stacked on an active interposer ### System Architecture Design JSSCC'2020 Symposium'2016 3DIC'2015 JSVLSI'2015 Chiplet (16 cores) Cluster 1.3 - 2.5 Vocabapae Cluster Cluster 1.5 - 2.5 Vocabapae Cluster 1.2 Vocabapae Cluster Cluster 1.2 Vocabapae Cluster Cluster 1.2 Vocabapae Cluster 1.2 Vocabapae Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster Cluster Cluster Cluster Cluster Cluster Cluster 1.2 Vocabapae Cluster ### 3D-System architecture, smart chiplet design and IPs with key performance assets: - **Scalability:** Cache-Coherency IP up to 512 cores with 3 levels of caches - High throughput @ ultra low power inter-layer connectivity: (3Tb/s/mm2; 0,59pJ/bit): 3D-Plugs inter-layer communication IP - **Energy efficiency (up to 81%):** Power management integrated in interposer - Ultra Low Latency (0.6ns/mm): Asynchronous NoC IP Enable heterogeneity, modularity and reuse of 3D-Design #### Heterogeneous 3D partitioning with: - 28nm FDSOI chiplets (x6) - Low power compute fabric - Wide voltage range (0.5V 1.3V) - Body biasing for logic boost & leakage ctrl - 65nm active interposer - Power unit (Switched Cap DC-DC conv.) - Interconnect (Network-on-Chip) - Test, clocking, thermal sensors, etc ### **Technology** **μ-bumps** Ø 10 μm Pitch 20 μm **TSV** Ø 10μm Height 100μm RISC-V, the obvious choice Open-HW as a key to succes Enhancing technology access to the real world. Source: Generated via Dall-E. ### (Aligned with the European HiPEAC roadmap vision) ### **Design Space Challenges** ### RISC-V & Open HW approach # RISC-V related achievements From ULP to HPC ### CEA RISC-V projects on a wide spectrum ### **HPC** ### Scalability #### **HPDcache**: High-Performance CORE-V-L1 Data Cache **Flexibility** Non-Volatile-Memory subsystem for Microcontrolers Speed ### Sovereignty **HPC Variable Precision Accelerator** SAMURAI IOT IC with NN accelerator **Efficiency** **ULP** Security **VASCO** ASIC vehicle for component security #### Contributions to cores: CV32E40P, CVA6 **Core building blocks** GPIO, UART, SPI, Interrupt controller Low latency OBI interconnect Cost #### **Suite of verification utilities** UVM agents (AXI, SPI) Models of memory, flow control, clock & reset, watchdog Performance monitors nnovation # High-Performance L1 Data Cache for RISC-V Cores (HPDcache): high throughput, flexible solution 3x Bandwidth increase Becoming the standard CVA6 cache: used in future products Industrial-grade verification with UVM testbench (also open-sourced) https://github.com/ openhwgroup/cv-hpdcache and https://github.com/ openhwgroup/cv-hpdcacheverif Ref.: César Fuguet. HPDcache: Open-Source High-Performance L1 Data Cache for RISC-V Cores. In Proc. of the 20th International Conference on Computing Frontiers (CF '23). DOI: 10.1145/3587135.3591413 Support of multiple independent requesters: CORE-V core, tightly-coupled accelerators Requester Requeste Arbiter 1 request/cycle HPDC Core Interface **Set-associative cache** with configurable number of sets and ways. Support of standard load, store, CMOs and atomic operations of the RISC-V ISA Pipelined micro-architecture for high-throughput and clock frequency Allow out-of-order execution of memory operations to avoid unnecessary stalls (with compliance with the RISCV RVWMO consistency model). Programmable hardware memory prefetcher with multiple engines for strided memory accesses. Write-through cache: Implements a write buffer supporting write coalescing and multiple inflight requests (we plan to support write-back). Supports a high (configurable) number of miss requests to the memory. Adapter for the AMBA AXI5 interface on the NoC/memory side Hardware Memory Prefetcher Engines CSRs ### Successful Integrations in CVA6 AM fixed access latency of Embedded (32 bits) Configuration Application (64 bits) Configuration VRP/VXP [3] Accelerator Configuration Results obtained with RAM fixed access latency of 100 clock cycles on the **application configuration** [2] VRP/VXP = CVA6 RISC-V core with ISA extension : - Additional register bank - New L1 D/I caches (incl. prefetchers) and LSUs - Additional functional unit and instructions Negligible area overhead: +5.92% compared to CVA6-WTDcache VRP/VXP: RISC-V Accelerator for Variable eXtended precision computing Motivation: Software emulation (e.g. MPFR) too slow Goal: be application-agnostic and limited by memory bandwidth instead of arithmetic ⇒ Variable extended Precision Floating Point Unit (VPFPU) integration in modified RISC-V CVA6 processor TRISTAN'24 (Graz, Austria) Sept. 11th 2024 #### Main CVA6 modifications: - 32 logical/64 physical 540-bit registers 17258. - Register renaming - OoO execution - Linked to HPDcache - 7 VPFPU functional units - Working iteratively on 64/128b chunks - ⇒ Performance depends on precision 128 outstanding read misses) together with a hardware prefetcher provides a 5x throughput improvement ### **VASCO Test Vehicle for Secure IPs** ### CORE security features and critical IP validated on silicon: CPU TRNG (RISC-V) Intercornexion 60C #### Secure processor with 3 protections - Pipeline (CV32b demo on ASIC FD-SOI) - Cache (FPGA 64b demo) - Memory encryption (FPGA 64b demo) 05 Biso FDX #### **Advanced Cryptography** Optimized and secure post-quantum cryptography (PQC) implementations > Périph. fonction. FD-SOI oriented secure cryptoaccelerator RAM & PUF 501 ### **Near memory computing** - C-SRAM for secure and crypto applications - Secure SRAM: fast and frugal erase function SRAM-PUF FD-SOI oriented design for primitives and countermeasures 22FDX Allan variance 1GS/s L. Benea and al., On the Characterization of Jitter in Ring Oscillators using Allan variance for True Random Number Generator Applications, DSD 2022 - · Entropy sources modeling and #### **Innovative TRNG** - FD-SOI oriented TRNG architecture - characterization **28nr** Ball VASCO#0: 2018 VASCO#1: 2020 VASCO#2: 2022 ### Updates on: Secure Processor **IA Accelerator PQC Accelator RNG** VASCO#3: Q4 2024 To define with our partners VASCO#3.1: 2026 #### VASCO: ONE STOP SHOP FOR DESIGNING & CHARACTERIZING INNOVATIVE CYBER-SECURITY IP ON ASIC #### **Architecture** Feasibility analysis Specifications Enhancement Benchmarking #### **Prototyping** MPW Shuttle Assembly & Test #### **Design** Specific IP Development & Integration (Crypto-accelerator, TRNG, PUF...) #### Characterization Hardware security tests Performance tests ## **Perspectives** Collaboration and innovation ## Computing innovation through open collaborations CEA is committed to support fast innovation and a sovereign open european ecosystem **ISA Specification** Software and tools Platinium Member Member of the Board of Directors μ-processor design - Fast and Efficient ML-based Power Modeling of Integrated Circuits - Formal models combination for safety properties verification Modeling of extra-functional properties of Automotive High Performance RISCV core ### **Perspectives** High Efficiency, Secure OpenSource Systems: Core + Memory + Interconnect From 32- & 64- to future 128-bit architectures Chiplet & System Interconnect > Memory Hierarchy & Caches Computing Cores **INTACT - FD28+65** MIPS 32-bit OpenSource TSAR L1/L2/L3 Memory **Architecture** PULP-based 32-bit Low Power Efficient Platform Used extensively for all IoT & cybersecurity circuit **UCIe, ODSA** Die-2-Die link for chiplet communication Multiple Independent requesters Out-of-order execution RISC-V 128 bit Global Address Space (GAS) to reach thousands of nodes + Simulation, Compiler, etc ANR MAPLURINUM Towards a generic **OpenSource Computing platform** for heterogeneous architectures **HPC** and new compute models targeting NCP Safe and secure systems Silicon evaluation & reference plateform 2030 2018 2020 2023 CEA - Thomas Dombek - RISC-V Summit 2025 Host CPU for HPC GF22FDX ARIANE-CVA6 64-bit, (including VXP accelerator) UCSB **Open PITON** 2026 Scialable Memory & Interconnect hierarchy, 3D Chiplet partitionning # Thanks for your attention! Any question? ### CEA at the RISC-V Summit Europe 2025 ### Booth #32 ## On display at our booth: VXP & VASCO 2 #### Talks: - "Sovereignty, independence, innovation: 7 years of HW/SW codesign with RISC-V at CEA" by Thomas Dombek (CEA). Keynote on Tue 13 at 10:00, in Gaston Berger (S2). - "VASCO: ASIC Test Platform for Cybersecurity on FD-SOI" by Stefano Di Matteo (CEA). Demo pres on Tue 13 at 15:35, in Louis Armand East (S3). - "RISC-V based GPGPU on FPGA: A Competitive Approach for Scientific Computing?" by Éric Guthmuller (CEA). Talk on Tue 13 at 17:00, in Gaston Berger (S2). #### **Posters:** - "Implementing out-of-order issue in CVA6 for efficient support of long variable latency instructions" by Eric Guthmuller (CEA). Poster on Tue 13, at island 2.1 on S2. - "CIAMH: Confidentiality, Integrity, and Authentication across the Memory Hierarchy" by Karim Ait Lahssaine (CEA). Poster on Wed 14, at island 1.1 on S1. - "Pre-silicon Security Analysis of RISC-V Processors against Fault Injection Attacks" by Damien Couroussé (CEA). Poster on Wed 14, at island 1.3 on S1. - "Comprehensive Lockstep Verification for NaxRiscv SoC Integrating RISC-V DV, RVLS, and Questa/UVM" by Billal Ighilahriz (CEA), Poster on Wed 14, at island 2.1 on S2. - "RISC-V based GPGPU on FPGA: A Competitive Approach for Scientific Computing?" by Éric Guthmuller (CEA). Poster on Wed 14, at island 3.1 on S3. - "RISC-V-based Acceleration Strategies for Post-Quantum Cryptography" by Stefano Di Matteo (CEA). Poster on Wed 14, at island 3.1 on S3. - "TYRCA: A RISC-V Tightly-Coupled Accelerator for Code-Based Cryptography" by Stefano Di Matteo (CEA). Poster on Wed 14, at island 3.1 on S3. - "Towards Efficient Modeling and Validation of Scalable Chiplet-Based Platforms" by Fatma Jebali, Ayoub Mouhagir (CEA). Poster on Thu 15, at island 2.3 on S2. May 13th 2025 ### **Thomas DOMBEK** thomas.dombek@cea.fr RISC-V Summit 2025 May 13<sup>th</sup> 2025