

# Leveraging TVM to optimize AI models for custom HW Accelerators and RISC-V extended instructions

Alexander Belke\*, Gilles Miet\*\* Robert Bosch Mobility Electronics, Engineering Integrated Circuits \*GmbH (Germany), \*\*SAS (France)

## Introduction

HW manufacturers development flow targets product that are:

- Efficient in term of time to market
- Flexible for easy upgrade
- Low cost with a silicon area as small as possible
- Low power via optimal choice and usage of processing resources

We choose and customized Apache TVM as HW/SW co-design framework enabling to generate and test the software regardless of the HW availability.



#### **TVM overview**

TVM enables to import, optimize, compile, and run a high-level defined Deep Learning network model; it is mainly composed of:

- A Deep Learning model Relay importer
- A graph optimizer in Relay language (Abstract Syntax Tree)
- An operator internal scheduler in TE language
- An operator low-level optimizer in TIR language



#### TVM with default BYOC

#### TVM with extended BYOC

BOSCH

Invented for life

## **RISCV** customized instruction integration into TVM

After relay graph partitioning, the TE schedule of the offloaded operators to custom RISCV compute lib is created.



#### **TVM customization**

#### **Processing simulation model**

Creation of C++ components to simulate the HWA and the RISC-V extended instructions. Compiled and run in the context of TVM or outside TVM in standalone environments.

### Hybrid methodology for int8 Quantization

We are using QNN operators for the Relay backend part by adding Relay passes that transform standard float32 NN Relay graph into a quantized version. QNN relay expressions are inserted, and operators are replaced with their quantized variants.

## **BYOC concept extended**

With Bring Your Own Codegen, custom HW components programming is left to proprietary toolchain and TVM optimization of the related code stops at Relay level.

We modified BYOC to handle offloaded operations as a TVM TE and TIR language representation enabling TVM to be used for schedule definition and optimization for custom hardware (via Autotune to optimize tiling).



tvmgen\_default\_fused\_strided\_slice\_reshape(/\*...\*/);
tvmgen\_default\_tristan\_compute\_lib\_main\_0(/\*...\*/);

// with:

tvmgen\_default\_tristan\_compute\_lib\_main\_0(/\*...\*/) {
 // ...





#### Conclusion

We leverage TVM to integrate a DL NN model in a heterogeneous system (RISC-V CPU with custom extended instructions and HWA). There we can validate, prepare, analyze, and optimize FW w/o the need for HW availability.



TRISTAN Project has received funding from the Chips Joint Undertaking (Chips-JU) under the grant agreement nr. 101095947. Chips-JU receives support from the European Union's Horizon Europe's research and innovation programme and Austria, Belgium, Bulgaria, Croatia, Cyprus, Czechia, Germany, Denmark, Estonia, Greece, Spain, Finland, France, Hungary, Ireland, Israel, Iceland, Italy, Lithuania, Luxembourg, Latvia, Malta, Netherlands, Norway, Poland, Portugal, Romania, Sweden, Slovenia, Slovakia and Turkey.



AoT codegen