banner

Demos

Notes for academic demo presenters

TBD.

Accepted demos

Check this page regularly for the schedule and location of demos!

Demos will soon be dispatched over the three days of the core conference
(Tuesday 9 to Thursday 11).


REPTILES: Repeated tiles of Sargantana

Sub. #78MPFT.

Lluc Alvarez, Arnau Bigas Soldevila and Serik Perez Gomez.

Abstract: This demo introduces Reptiles - Repeated Tiles of Sargantana, an open-source RISC-V multicore architecture designed to support research in HPC systems. Reptiles builds upon the OpenPiton manycore framework by integrating multiple Sargantana RISC-V cores and enhancing the memory hierarchy and interconnection network to improve scalability and performance. The goal is to provide an accessible and flexible platform for researchers to develop, experiment with, and optimize HPC workloads using open hardware.

Reptiles replicates Sargantana tiles within OpenPiton’s architecture and introduces several architectural improvements. These include a configurable network-on-chip width (from 64 up to 704 bits), flexible cache block sizes, adjustable numbers of miss status holding registers (MSHRs), improved cache sizes and associativities, parallel SRAM access in the L2 and the last-level cache, and configurable number of memory controllers. The system also integrates the High-Performance Data Cache (HPDcache) as an L1 data cache and enhances the Sargantana core with broader support for RISC-V extensions, particularly the RISC-V Vector Extension (RVV 1.0). Additional improvements include debugging support, performance counter access in Linux, and enhanced RTL simulation features such as checkpointing.

In this demo we show a fully functional FPGA prototype of Reptiles with four Sargantana cores booting Linux and running OpenMP benchmarks such as the NAS Parallel Benchmarks, interactive UART console games, and graphical applications by performing X11 forwarding over SSH. Overall, Reptiles demonstrates that open-source RISC-V multicore systems can effectively support scalable HPC research and experimentation.


RISC-V POWERED QUANTUM SENSOR

Sub. #AVMPST.

agata.kusnina.

Abstract: Demo proposal presents a RISC-V powered quantum sensor designed for ultra-precise magnetic field measurements even at room temperature using nitrogen-vacancy (NV) center defects in diamond. Quantum magnetometers have a wide range of applications including localization, microscopy, and system control. With RISC-V processor being integrated into the developed system, the aim is to achieve world’s most efficient sensor readout and unlock quantum sensing potential for widespread adoption. For the demonstrator, generic event-based architecture was developed, where the RISC-V plays a vital role in coordinating the hardware and provides a foundation for future miniaturization of the sensor electronics and readout ASIC design. The developed prototype enables pulsed optically detected magnetic resonance (ODMR) measurements, that provide significantly higher precision and improved experimental control in comparison to continuous-wave (CW) techniques. The goal is to showcase RISC-V powered quantum sensor with EDI (Institute of Electronics and Computer Science, Latvia) integrated setup incorporating analogue electronics for generating and sampling microwaves, digital electronics for pre-processing and control, and application-level software for users. However, even if hardware issues arise, the live demonstrator will showcase the complete RISC-V-powered quantum sensing system using the PolarFire SoC Video Kit based sensor platform. Open-source RISC-V processor grants more freedom for future ASIC implementation of the measurement system.


Showcasing the ARCANE In-Cache computing IP into a RISC-V Linux system

Sub. #EGU3RV.

Vincenzo Petrolo.

Abstract: The increasing computational demands characteristic of contemporary deep learning models, particularly those associated with computer vision tasks employing Vision Transformers, present considerable constraints for energy-limited smart devices and edge computing platforms. To address this challenge, we demonstrate a RISC-V SoC that incorporates ARCANE, a 512KiB compute-capable Last-Level Cache, which enables In-Cache Computing (ICC). This capability is crucial for substantially mitigating the energy and latency overheads linked to data movement between the central processing unit (CPU) and main memory—a primary architectural bottleneck. To validate the system’s operational maturity, we deploy models such as the 22-million parameter DINOv2-S and the lightweight MobileNetV2 utilizing the TVM framework. This deployment serves to demonstrate the platform’s capacity to efficiently execute both state-of-the-art, computationally intensive computer vision workloads and standard image classification tasks within a unified environment. The system, instantiated on a ZCU104 FPGA featuring 1GiB of DDR4 memory, operates at a clock frequency of 80MHz and furnishes a Linux operating environment complete with a dedicated suite of user applications. These applications provide quantitative evidence of the significant performance advantages conferred by ARCANE’s near-memory computing paradigm when compared against CPU-only execution. By integrating a custom tensor ISA that remains transparent and lock-less to the application programmer, ARCANE establishes itself as a valuable and pioneering contribution to the RISC-V ecosystem, representing one of the first In-Cache Computing IP cores integrated into a Linux operating environment.


Integrated Development Environment Features for Unified Database Specification Development

Sub. #F7UW8J.

Madeline Seifert, Isabel Godoy, Ajit Dingankar, Brayden Mendoza, Lughnasa Miller and Nina Luo.

Abstract: The RISC-V Unified Database (UDB) serves as a machine-readable “source of truth” for written RISC-V specifications. To improve the ease of creating these specifications, Qualcomm collaborated with a team of Harvey Mudd College students to develop an Integrated Development Environment (IDE) toolkit that can support architects for RISC-V specifications. The team has worked to develop many of the features one would consider standard for developing in a programming language in a modern IDE, including syntax highlighting, autocompletion, and cross-referencing. The groundwork for this IDE also lays the foundation for other tool developers for the RISC-V ecosystem to use information contained in the UDB more efficiently.


OSOC Mambo Robot: RISC-V processor chip showcase using open-source IP, EDA, and PDK

Sub. #FZ3AYJ.

Xiaoke Su.

Abstract: The Mambo XiaoXin Robot uses the StarrySky C2-Pico open-source development board as its core controller, paired with ASR-PRO voice recognition module, forming a compact robotic system that integrates motion control, voice interaction, and intelligent response. The StarrySky C2-Pico board is equipped with the RetroSoC chip independently developed by the ECOS(EDA, Chip, One Student One Chip, System) team. This chip is fabricated using the ICSprout 55 nm open-source PDK process flow and represents a technological achievement that combines the open instruction set RISC-V, open-source EDA, open-source IP, and open-source PDK. Its functionality and performance are benchmarked against the low- to mid-end products of ST’s F1 series. Internally, the chip integrates the classic lightweight open-source RISC-V processor core PicoRV32, fully implementing the RV32IMC instruction set architecture, with a maximum clock frequency of up to 72 MHz. The chip includes 128 KB of on-chip SRAM, while the board further expands storage with 8 MB PSRAM and 16 MB SPI Flash, forming a multi-level memory system. In addition, the chip integrates a rich set of open-source peripheral drivers, including UART, SPI, I2C, PS/2, PWM, GPIO, timers, and more, meeting diverse embedded development requirements.


Running ILP32 on RVA(22/23)S64: AI Glasses Product Demo

Sub. #GQUDKS.

GUO Ren.

Abstract: Historically, many architectures have attempted to run ILP32 software on 64-bit ISAs, such as x86-X32, mips-N32, and arm64-ILP32. However, only arm64ilp32 achieved commercial success on Apple Watch OS in a closed-source manner.

Today, we present the commercial deployment of RV64-ILP32 based on Allwinner v861 AI Glasses chips (Dual-Core RISC-V XuanTie C907) 1. This demo showcases AI Glasses running the ILP32 Linux kernel on RVA22S64. Compared to traditional RV32, performance improves significantly: iperf throughput reaches 1.5×, and lmbench shows 1.1–1.2× gains across most tests. Furthermore, another demo runs LP64 applications on an RV64-ILP32 Linux kernel within a 2GB address space for the first time, highlighting this ABI’s compatibility, flexibility, and potential. This demo achievement marks a milestone in bringing 64-bit RISC-V architectural benefits to resource-constrained embedded AI devices while maintaining ILP32 memory efficiency based on an open-source software stack.

This demo illustrates ILP32 on RVA (22/23) S64. Next, call for sponsors for ILP32 on RVA (22/23) U64!


“One Student One Chip”: Student Board Power-Up Demo Video

Sub. #HXQWPZ.

Xiaoke Su.

Abstract: This video documents the unboxing and functional verification process of the StarrySky development board by participants of the “One Student One Chip (OSOC)” Program IV, following the successful tape-out and chip delivery. Featuring a fully customized RISC-V processor core independently developed by the trainees, this self-designed board demonstrates remarkable technical achievements through successful execution of classic games like Mario and rendering of the university’s emblem. This helped students strengthen capabilities in hardware–software co-design of computer systems, and cultivated their abilities to understand, build, debug, and optimize complex systems. The student in this video, who is called Tao Zhou, is currently a core technical contributor in the XiangShan frontend team, responsible for the development and performance optimization of the ICache and BPU.


RISC-V Edge Inference for Real-Time Eye-Movement Control on GAPses Smart Glasses

Sub. #LMUL9F.

Sebastian Frey and Andrea Helga Bernardi.

Abstract: This live demonstration showcases GAPses, an ultra-low-power smart-glasses platform based on an ultra-low power RISC-V multicore processor (GAP9), enabling always-on, real-time, energy-efficient edge processing of electrooculography (EOG) and electroencephalography. GAPses performs on-device signal processing and machine-learning inference, converting raw biosignals into events without cloud compute or continuous high-bandwidth streaming, enabling energy-scalable and privacy-preserving operation. In the demo, dry electrodes integrated into the glasses frame capture horizontal/vertical EOG, and an on-device lightweight CNN running on GAP9 classifies saccadic eye movements from these EOG signals in real time. The resulting eye-movement events are transmitted via BLE to a laptop running a visualization application, which displays the CNN outputs alongside filtered EOG traces. The classification stream drives multiple interactive scenarios, including grid control, a Tetris game, and live class-probability visualization. During the demo session, we will run the complete pipeline live: a team member will wear the glasses and perform a sequence of saccades to trigger on-device CNN inference. The GUI updates in real time with predicted classes and EOG traces, allowing attendees to observe latency, robustness, and privacy benefits of RISC-V-based embedded biosignal inference in a practical wearable form factor. Overall, the demo highlights GAPses as an open, fully wearable research platform and illustrates how parallel RISC-V compute enables always-on neural interfaces by executing sensing, inference, and event-level decisions locally without cloud dependence or continuous high-bandwidth streaming.


LIBERO: A Flexible, Lightweight GDB-based Visualization Tool for RISC-V Vector Extensions

Sub. #Q97WYM.

Jakob Schäffeler, Nima Baradaran Hassanzadeh, Carsten Trinitis and Kun Qin.

Abstract: The RISC-V Vector (RVV) extension introduces powerful yet complex semantics for data-parallel execution, including dynamically sized vectors, per-lane masking, and flexible element widths and groupings. While these features offer high performance and portability, they also complicate debugging, as existing tools, such as GDB, do not present RVV registers in a configuration-aware manner. Consequently, raw and verbose register dumps must be manually interpreted relative to the current vector register state. With register widths of up to 65,536 bits this quickly becomes impractical, making it difficult to understand effects of individual instructions and spot values of interest efficiently.

This demo presents LIBERO, a lightweight visualization tool integrated directly into GDB through its Python API. LIBERO augments GDB’s Text User Interface (TUI) with a custom register view that continuously displays vector contents alongside the relevant configuration state during program execution. LIBERO allows users to select which vector registers to display and automatically renders them based on the width specified in the status register. By embedding these capabilities into GDB, LIBERO enables developers to reason about RVV code more efficiently while preserving the familiar GDB workflow.


RISC-V Edge Processing for Real-Time Unobtrusive Driver State Monitoring on the Automotive SoC

Sub. #QBPTRZ.

Massimo.

Abstract: This live demonstration showcases the integration of Carfield, a heterogeneous automotive RISC-V SoC for mixed-criticality edge intelligence applications, with SHIELD, a non-intrusive, multimodal smart steering wheel. SHIELD enables robust, redundant acquisition of physiological signals to monitor the driver’s state continuously. During the demo, dry electrodes embedded within the steering wheel synchronously measure electrocardiography (ECG), electrodermal activity (EDA), photoplethysmography (PPG), and body temperature from both hands. Raw signals are transmitted via the automotive CAN-FD protocol directly to the Carfield SoC, while simultaneously streaming to a PC GUI via Bluetooth Low Energy (BLE) or WiFi. The RISC-V core processes the incoming CAN-FD data stream in real time. It performs digital signal filtering and employs golden-standard algorithms, including the Pan-Tompkins algorithm for ECG and PPG peak detection, to analyze heart rate (HR) and heart rate variability (HRV) in both the time and frequency domains. In the live session,(see Figure 1), a team member will use the smart steering wheel during a dynamic driving simulation using BeamNG.tech. Attendees will observe the GUI updating in real time, displaying the physiological waveforms alongside the HR and HRV metrics computed by Carfield. Overall, this demo illustrates how heterogeneous, open-source RISC-V architectures can efficiently handle vital sensor data acquisition and complex biosignal processing at the edge in a real-time automotive contest, paving the road for non-intrusive, real-time driver monitoring systems in next-generation vehicles.


On-Device Context-Informed Incremental Learning for Myoelectric Control on RISC-V-based Wearable Platform

Sub. #QEAHWX.

Margherita Rossi and Mattia Orlandi.

Abstract: This live demonstration showcases our custom surface electromyography (sEMG) armband, enabling 16-channel monopolar acquisition. It features the RISC-V-based GAPWatch platform, which integrates two ADS1298 ADCs, an ESP32 radio module, GAP9 (a programmable multi-core RISC-V processor), and an STM32U5 microcontroller acting as a system gateway. The armband is used to control a cursor in a 2D reach-and-hold task through EMG gestures. The system runs a context-informed incremental learning pipeline directly on GAP9. EMG signals are acquired, filtered, and fed to a tiny CNN, which predicts one of four gestures mapped to cursor directions (e.g., index finger contraction for LEFT, middle finger contraction for UP, etc.). Predictions are transmitted via BLE to a computer running the GUI with the task. The GUI updates the cursor position and derives a pseudolabel from the task context. If the predicted movement brings the cursor closer to the target, the pseudolabel acts as a reward signal; otherwise, it provides corrective feedback. This pseudolabel is returned to the device, where the CNN is updated via stochastic gradient descent (SGD). A replay mechanism is also implemented to stabilize training. EMG processing, inference, and SGD are all executed on GAP9. During the demo, a participant will perform the task starting from an untrained model. As the task progresses, attendees can observe real-time on-device adaptation. The demonstration highlights how parallel RISC-V processing enables fully embedded, adaptive HMIs without reliance on the cloud or external PCs for recalibration.


Hardware Acceleration Island for Safety-Critical Applications based on RISC-V

Sub. #QPRVWP.

Luis Waucquez.

Abstract: The complexity of modern electronics systems and their behavior in harsh environments, demanding performance, fault-tolerance capabilities, and energy efficiency, proves the need to design and implement systems adaptable to applications with mixed-criticality requirements. The Extensible Reliable Offloading Solution (EROS) has been developed as a HW-based accelerator template capable of addressing these requirements. It is compatible with several RISC-V cores from the OpenHW Foundation and eases the integration of both MM accelerators and ISA extensions using CV-X-IF coprocessors. The platform offers a safety wrapper, allowing the selected core to be configured at design time and runtime in different operational modes, from single core execution to fault-tolerant operational modes such as TCLS, DCLS, and staggered. It also provides methods for error detection and recovery. The EROS solution has been implemented as a safety accelerator island in the X-HEEP system, a RISC-V microcontroller platform conceived for ultra-low power scenarios, creating the resulting X-EROS system. This demo evaluates X-EROS, which has been taped out in TSMC 65nm LP technology. The platform is evaluated through performance analysis results obtained from the execution of an AES-256-CBC algorithm. In conjunction, a controlled error injection is performed to prove the functional detection and recovery capabilities. The overall system power consumption is measured to show the different power profiles under different modes of operation, demonstrating the capacity of the platform to adapt itself not only to fault-tolerance requirements but also to low-power requirements.


End-to-End On-Device Transformer Training on Ultra-Low Power RISC-V MCU

Sub. #SRK9TJ.

RunW and Victor Jung.

Abstract: This demo showcases complete end-to-end Transformer training locally on the GAP9 RISC-V MCU. On-device training is crucial for applications that operate in dynamically changing environments. One example is biosignal DNNs in wearable devices, where cross-subject transfer and long-term temporal drift degrade performance. RISC-V MCUs are already widely used for edge DNN deployment. However, most existing work focuses either on inference only, or on fine-tuning a small portion of the network.

We extended the Deeploy compiler to generate training code. Deeploy generates bare-metal C code from an ONNX graph and is tailored for efficient inference. To support training, we added critical kernels such as optimizers and in-place gradient accumulators. We also extended the ONNX runtime training API to generate graphs optimized for edge deployment. This extension is released at https://github.com/pulp-platform/ONNX4Deeploy. To reduce the memory footprint of batching required for stable training, we implement gradient accumulation. The demo video showcases the full workflow, from training graph optimization to code generation and on-board execution. The video is available at https://drive.google.com/file/d/16BMiHn0jyMvScFJD7AGTwHpA4Rc0aMnC/view?usp=drive_link and will be uploaded to the Pulp Platform YouTube channel.


Accelerating Matrix Operations with a Custom RISC‑V SIMD/Vector Extension and Automated LLVM Support

Sub. #UC3AZA.

Catalin Ciobanu.

Abstract: The development of our tightly coupled SIMD/Vector accelerator for matrix operations requires extending the RISC-V instruction set. Special compiler support is required for this extension. Our methodology starts from a Sail description of the ISA extension and generates the compiler target description data.

The accelerator main features are: 32 software defined 2D registers, dedicated hardware for matrix operations and a dedicated memory interface. The accelerator employs the CoreV-eXtension-Interface (CV-X-IF) and could be connected to multiple RISC–V cores that feature this interface.

The custom instructions extend the RISC–V ISA and follow their encoding. The custom instructions are of three types: to define matrix registers, matrix operations and memory operations.

The instructions are described in Sail and are tested in the generated simulator. adl_tool transforms the Sail architecture description into compiler model artifacts needed to build a functional prototype compiler for the given specification. Additionally, provides automatically generated tests to validate the correctness of the instruction encodings.

The compiler was generated from the description model and tested with the accelerator implemented in hardware. The experimental results suggest that for matrix multiplication we obtained speed-ups up to 1413x compared to an ARM A72 core.


ML-KEM on a 22 nm ASIC: Protected, Unprotected, and Hardware-Accelerated Implementations

Sub. #YQDVJU.

Stefano Di Matteo and Emanuele Valea.

Abstract: Post-Quantum Cryptography is becoming a key building block for future secure systems, as quantum computers threaten widely deployed public-key cryptographic algorithms. In response, the NIST standardization process has selected new quantum-resistant schemes, among which ML-KEM plays a central role for key establishment. Deploying these algorithms efficiently on embedded processors is therefore a critical step toward practical adoption, particularly because embedded systems face strict constraints in terms of computational resources, memory footprint, and energy consumption. At the same time, they are more exposed to physical threats, making resistance to side-channel attacks a key requirement. These constraints make RISC-V especially attractive: its open instruction set and extensibility allow experimentation with software optimizations as well as hardware acceleration for PQC. To explore these aspects, CEA has developed VASCO3, a 22 nm ASIC chip designed to experimentally evaluate PQC implementations and side-channel countermeasures directly on silicon. The chip integrates a RISC-V–based System-on-Chip (SoC) together with several ML-KEM hardware accelerators, enabling the study of different hardware/software partitioning strategies around an embedded RISC-V CPU. In this demonstration, we present a comprehensive exploration of ML-KEM. We first showcase a pure software implementation running on the RISC-V, then progressively introduce hardware acceleration and a fully dedicated ML-KEM accelerator. We also demonstrate protected implementations based on first-order masking, including a masked software version and a masked hardware-assisted design.