## Implementation of Branch Treatment Strategies in the Ripes RISC-V Simulator

w x6 0 x9

lw x7 0 x18

add x6 x21



Universidad de **©UCLM** Castilla-La Mancha

Silvia González-Rodríguez, Francisco Alfaro-Cortés, Rafael Rodríguez-Sánchez, Jesús Escudero-Sahuquillo Departamento de Sistemas Informáticos, Escuela Superior de Ingeniería Informática de Albacete, Universidad de Castilla-La Mancha, Spain

Contact: silvia.gonzalez17@alu.uclm.es

### Before

Ripes currently only has one branch strategy for its pipelined processors: predict-not-taken, resolved in the EX stage for two\* delay slots.

#### This was insufficient for our Computer

Organisation course curriculum, where we want to explore the impact of different strategies on the performance of program execution.



\* Three, in the case of the two-way superscalar model.

| sw x7 0 x19                  | ID | EX | MEM | WB  |     |     |     |    |   |   | IF | ID | EX | MEM | WB  |
|------------------------------|----|----|-----|-----|-----|-----|-----|----|---|---|----|----|----|-----|-----|
| addi x9 x9 4                 | IF | ID | EX  | MEM | WB  |     |     |    |   |   |    | IF | ID | EX  | MEN |
| addi x18 x18 4               |    | IF | ID  | EX  | MEM | WB  |     |    |   |   |    |    | IF | ID  | EX  |
| addi x19 x19 4               |    |    | IF  | ID  | EX  | MEM | WB  |    |   |   |    |    |    | IF  | ID  |
| bne x20 x9 -32 <loop></loop> |    |    |     | IF  | ID  | EX  | MEM | WB |   |   |    |    |    |     | IF  |
| addi x17 x0 10               |    |    |     |     | IF  | ID  |     |    |   |   |    |    |    |     |     |
| addi x22 x22 1               |    |    |     |     |     | IF  |     |    |   |   |    |    |    |     |     |
| addi x22 x22 -1              |    |    |     |     |     |     |     |    |   |   |    |    |    |     |     |
| 4                            |    | 1  |     |     | 1   | 1   |     |    | 1 | 1 | 1  | 1  | 1  | 1   |     |

### After

Thanks to Ripes' modular processor models, we were able to implement our additional branch treatment strategies as five additional models. In these models, which we based on the fivestage pipelined processor with hazard detection and forwarding, the circuitry was modified to achieve the desired behaviour.



|   |   |   |   |   |   |   |   |  |  |  |  |      |   | 0 | :0 |     |  |  |  |  |  |  |  |  |  |
|---|---|---|---|---|---|---|---|--|--|--|--|------|---|---|----|-----|--|--|--|--|--|--|--|--|--|
|   |   |   |   |   |   |   |   |  |  |  |  |      |   | - | 2  |     |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   | T |  |  |  |  |      | Г |   | 3  | 1:0 |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   | T |  |  |  |  | 31:0 | ) |   |    |     |  |  |  |  |  |  |  |  |  |
| t | _ | _ | _ | _ | _ | _ | - |  |  |  |  |      |   |   |    | 1:0 |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   |   |  |  |  |  |      |   |   |    |     |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   |   |  |  |  |  |      |   |   |    |     |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   |   |  |  |  |  |      |   |   |    |     |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   |   |  |  |  |  |      |   |   |    |     |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   |   |  |  |  |  |      |   |   |    |     |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   |   |  |  |  |  |      |   |   |    |     |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   |   |  |  |  |  |      |   |   |    |     |  |  |  |  |  |  |  |  |  |
|   |   |   |   |   |   |   |   |  |  |  |  |      |   |   |    |     |  |  |  |  |  |  |  |  |  |

selection dialog awkward to use, so it was redesigned to allow the user to select a processor based on its individual characteristics. To enable this, Ripes' processor registry was leveraged to add the necessary metadata to the processor models.

|                                  | Configu                                                    | re Processor                                                                                                                                                               | × |
|----------------------------------|------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
| ISA:                             | RISC-V ▼ 32-bit ▼<br>Extensions: ✓ M □ C                   | A 5-stage in-order processor with hazard detection/elimination and forwarding, with delayed branches solved at the EX stage.                                               |   |
| Datapath:                        | Five-stage         ✓ Forwarding         ✓ Hazard detection | NOTE: this branch strategy may result in<br>programs executing incorrectly, unless<br>resolved by rearranging the instructions<br>surrounding branches, or inserting nop's |   |
| Branches:                        | Delayed branch 🔹 1-slot                                    | after them to fill the delay slots.                                                                                                                                        |   |
| Register i<br>x2 (sp)<br>x3 (gp) | nitialization                                              | Layout: Extended -                                                                                                                                                         |   |

|   | Selec                                                                                                                                                                                                                                          | t P | rocessor           |                                                                                                                                                                |
|---|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| - | RISC-V<br>→ 32-bit<br>Single-cycle processor<br>5-stage processor w/o forwarding or hazard<br>5-stage processor w/o hazard detection<br>5-Stage processor w/o forwarding unit                                                                  |     |                    | 5-stage processor (1-slot delayed brand<br>RV32I<br>M C                                                                                                        |
|   | 5-stage processor<br>5-stage processor (1-slot predict-not-taken)                                                                                                                                                                              |     | Layout             | Extended A 5-stage in-order processor with haza                                                                                                                |
|   | 5-stage processor (1-slot delayed branch)<br>5-stage processor (2-slot delayed branch)<br>5-stage processor (3-slot predict-not-taken)<br>5-stage processor (3-slot delayed branch)<br>6-stage dual-issue processor                            |     | Description:       | detection/elimination and forwarding,<br>delayed branches solved at the EX stag<br>NOTE: this branch strategy may resu<br>programs executing incorrectly, unle |
|   | <ul> <li>64-bit</li> <li>Single-cycle processor</li> </ul>                                                                                                                                                                                     |     | Register initia    | alization                                                                                                                                                      |
|   | <ul> <li>5-stage processor w/o forwarding or hazard</li> <li>5-stage processor w/o hazard detection</li> <li>5-stage processor w/o forwarding unit</li> <li>5-stage processor</li> <li>5-stage processor (1-slot predict-not-taken)</li> </ul> |     | x2 (sp)<br>x3 (gp) | <ul> <li>0x7fffff0</li> <li>0x1000000</li> </ul>                                                                                                               |
|   | 5-stage processor (1-slot delayed branch)<br>5-stage processor (2-slot delayed branch)                                                                                                                                                         | Ŧ   |                    | +                                                                                                                                                              |
|   |                                                                                                                                                                                                                                                |     |                    | • <u>C</u> ancel                                                                                                                                               |
|   |                                                                                                                                                                                                                                                |     |                    |                                                                                                                                                                |
|   |                                                                                                                                                                                                                                                |     |                    |                                                                                                                                                                |



| - | 60  | 59  | 58  | 57  | 56  | 55  | 54 | 53  | 52  | 51  | 50  | 49  | 48  | 47  |                              |
|---|-----|-----|-----|-----|-----|-----|----|-----|-----|-----|-----|-----|-----|-----|------------------------------|
|   |     |     |     |     |     |     |    |     |     |     |     |     |     |     | lw x21 0 x21                 |
|   |     |     |     |     | WB  | MEM | EX | ID  | IF  |     |     |     |     |     | lw x6 0 x9                   |
|   |     |     |     | WB  | MEM | ΕX  | ID | IF  |     |     |     |     |     | WB  | lw x7 0 x18                  |
|   |     |     | WB  | MEM | EX  | ID  | IF |     |     |     |     |     | WB  | MEM | add x6 x21 x6                |
|   |     | WB  | MEM | EX  | ID  | IF  |    |     |     |     |     | WB  | MEM | EX  | add x7 x6 x7                 |
|   | WB  | MEM | EX  | ID  | IF  |     |    |     |     |     | WB  | MEM | EX  | ID  | sw x7 0 x19                  |
|   | MEM | EX  | ID  | IF  |     |     |    |     |     | WB  | MEM | EX  | ID  | IF  | addi x9 x9 4                 |
|   | EX  | ID  | IF  |     |     |     |    |     | WB  | MEM | EX  | ID  | IF  |     | addi x18 x18 4               |
|   | ID  | IF  |     |     |     |     |    | WB  | MEM | EX  | ID  | IF  |     |     | addi x19 x19 4               |
|   | IF  |     |     |     |     |     | WB | MEM | EX  | ID  | IF  |     |     |     | bne x20 x9 -32 <loop></loop> |
|   |     |     |     |     |     |     |    |     |     | IF  |     |     |     |     | addi x17 x0 10               |
| * |     |     |     |     |     |     |    |     |     |     |     |     |     |     | addi x22 x22 1               |

Results

|                              | 41  | 42  | 43  | 44  | 45  | 46  | 47  | 48 | 49 | 50 | 51  | 52  | 53  | 54  | 55  | 56  | 57  | 58  | f |
|------------------------------|-----|-----|-----|-----|-----|-----|-----|----|----|----|-----|-----|-----|-----|-----|-----|-----|-----|---|
| lw x21 0 x21                 |     |     |     |     |     |     |     |    |    |    |     |     |     |     |     |     |     |     |   |
| lw x6 0 x9                   |     |     |     |     |     |     |     | IF | ID | EX | MEM | WB  |     |     |     |     |     |     |   |
| lw x7 0 x18                  | WB  |     |     |     |     |     |     |    | IF | ID | EX  | MEM | WB  |     |     |     |     |     |   |
| add x6 x21 x6                | MEM | WB  |     |     |     |     |     |    |    | IF | ID  | EX  | MEM | WB  |     |     |     |     |   |
| add x7 x6 x7                 | EX  | MEM | WB  |     |     |     |     |    |    |    | IF  | ID  | ΕX  | MEM | WB  |     |     |     |   |
| sw x7 0 x19                  | ID  | EX  | MEM | WB  |     |     |     |    |    |    |     | IF  | ID  | EX  | MEM | WB  |     |     |   |
| addi x9 x9 4                 | IF  | ID  | EX  | MEM | WB  |     |     |    |    |    |     |     | IF  | ID  | EX  | MEM | WB  |     | 1 |
| addi x18 x18 4               |     | IF  | ID  | EX  | MEM | WB  |     |    |    |    |     |     |     | IF  | ID  | EX  | MEM | WB  | 1 |
| addi x19 x19 4               |     |     | IF  | ID  | EX  | MEM | WB  |    |    |    |     |     |     |     | IF  | ID  | EX  | MEM |   |
| bne x20 x9 -32 <loop></loop> |     |     |     | IF  | ID  | EX  | MEM | WB |    |    |     |     |     |     |     | IF  | ID  | EX  |   |
| addi x17 x0 10               |     |     |     |     | IF  | ID  | EX  |    |    |    |     |     |     |     |     |     | IF  | ID  |   |
| addi x22 x22 1               |     |     |     |     |     | IF  | ID  |    |    |    |     |     |     |     |     |     |     | IF  | 1 |
| addi x22 x22 -1              |     |     |     |     |     |     | IF  |    |    |    |     |     |     |     |     |     |     |     |   |

The five-stage pipelined processor in our Ripes fork can now be set up with one, two, or three delay slots, with both predict-nottaken and delayed branches, with the effect on program execution seen here.

← Predict not taken

Delayed branches  $\rightarrow$ 

| • | 60  | 59  | 58  | 57  | 56  | 55  | 54  | 53  | 52  | 51  | 50  | 49  | 48  | 47  |                              |
|---|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|------------------------------|
|   |     |     |     |     |     |     |     |     |     |     |     |     |     |     | w x21 0 x21                  |
|   |     |     |     |     | WB  | MEM | EX  | ID  | IF  |     |     |     |     |     | lw x6 0 x9                   |
|   |     |     |     | WB  | MEM | EX  | ID  | IF  |     |     |     |     |     | WB  | lw x7 0 x18                  |
|   |     |     | WB  | MEM | EX  | ID  | IF  |     |     |     |     |     | WB  | MEM | add x6 x21 x6                |
|   |     | WB  | MEM | EX  | ID  | IF  |     |     |     |     |     | WB  | MEM | EX  | add x7 x6 x7                 |
|   | WB  | MEM | EX  | ID  | IF  |     |     |     |     |     | WB  | MEM | EX  | ID  | sw x7 0 x19                  |
|   | 1EM | EX  | ID  | IF  |     |     |     |     |     | WB  | MEM | EX  | ID  | IF  | addi x9 x9 4                 |
|   | EX  | ID  | IF  |     |     |     |     |     | WB  | MEM | EX  | ID  | IF  |     | addi x18 x18 4               |
|   | ID  | IF  |     |     |     |     |     | WB  | MEM | EX  | ID  | IF  |     |     | addi x19 x19 4               |
|   | IF  |     |     |     |     |     | WB  | MEM | EX  | ID  | IF  |     |     |     | bne x20 x9 -32 <loop></loop> |
|   |     |     |     |     |     | WB  | MEM | ΕX  | ID  | IF  |     |     |     |     | addi x17 x0 10               |
| • |     |     |     |     |     |     |     |     |     |     |     |     |     |     | addi x22 x22 1               |

This project is publicly available on GitHub.

# Try it

#### Acknowledgements

This work has been funded by the Junta de Comunidades de Castilla-La Mancha, under the project SBPLY/21/180225/000103 and the Spanish Ministry of Science, Innovation and Universities under the projects PID2021-1236270B-C52 and TED2021-130233B-C31. Moreover, this work is also funded by Grant Cátedra PERTE Chip University of Castilla-La Mancha (TSI-069100-2023-0014) funded by the Ministry of Digital Transformation (Cátedras PERTE Chip Programme) through the "European Union NextGenerationEU/PRTR"

|                              | 50  | 51  | 52  | 53  | 54  | 55  | 56  | 57  | 58  | 59  | 60  | 61  | 62  | 63  | 64  |   |
|------------------------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|---|
| lw x21 0 x21                 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     | Γ |
| lw x6 0 x9                   |     |     |     |     |     |     | IF  | ID  | EX  | MEM | WB  |     |     |     |     |   |
| lw x7 0 x18                  | WB  |     |     |     |     |     |     | IF  | ID  | EX  | MEM | WB  |     |     |     |   |
| add x6 x21 x6                | MEM | WB  |     |     |     |     |     |     | IF  | ID  | ΕX  | MEM | WB  |     |     |   |
| add x7 x6 x7                 | EX  | MEM | WB  |     |     |     |     |     |     | IF  | ID  | EX  | MEM | WB  |     |   |
| sw x7 0 x19                  | ID  | EX  | MEM | WB  |     |     |     |     |     |     | IF  | ID  | EX  | MEM | WB  | Γ |
| addi x9 x9 4                 | IF  | ID  | EX  | MEM | WB  |     |     |     |     |     |     | IF  | ID  | ΕX  | MEM |   |
| addi x18 x18 4               |     | IF  | ID  | EX  | MEM | WB  |     |     |     |     |     |     | IF  | ID  | EX  | r |
| addi x19 x19 4               |     |     | IF  | ID  | ΕX  | MEM | WB  |     |     |     |     |     |     | IF  | ID  |   |
| bne x20 x9 -32 <loop></loop> |     |     |     | IF  | ID  | EX  | MEM | WB  |     |     |     |     |     |     | IF  |   |
| addi x17 x0 10               |     |     |     |     | IF  | ID  | EX  | MEM | WB  |     |     |     |     |     |     |   |
| addi x22 x22 1               |     |     |     |     |     | IF  | ID  | EX  | MEM | WB  |     |     |     |     |     |   |

| Anna An                      |     |     |     |     |     |     |     |     |     |     |     | # of | record | ed cycl | es can | be cha | nged i | n setti |
|------------------------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|------|--------|---------|--------|--------|--------|---------|
|                              | 41  | 42  | 43  | 44  | 45  | 46  | 47  | 48  | 49  | 50  | 51  | 52   | 53     | 54      | 55     | 56     | 57     | 58      |
| lw x21 0 x21                 |     |     |     |     |     |     |     |     |     |     |     |      |        |         |        |        |        |         |
| lw x6 0 x9                   |     |     |     |     |     |     |     | IF  | ID  | EX  | MEM | WB   |        |         |        |        |        |         |
| lw x7 0 x18                  | WB  |     |     |     |     |     |     |     | IF  | ID  | EX  | MEM  | WB     |         |        |        |        |         |
| add x6 x21 x6                | MEM | WB  |     |     |     |     |     |     |     | IF  | ID  | EX   | MEM    | WB      |        |        |        |         |
| add x7 x6 x7                 | EX  | MEM | WB  |     |     |     |     |     |     |     | IF  | ID   | EX     | MEM     | WB     |        |        |         |
| sw x7 0 x19                  | ID  | EX  | MEM | WB  |     |     |     |     |     |     |     | IF   | ID     | EX      | MEM    | WB     |        |         |
| addi x9 x9 4                 | IF  | ID  | EX  | MEM | WB  |     |     |     |     |     |     |      | IF     | ID      | EX     | MEM    | WB     |         |
| addi x18 x18 4               |     | IF  | ID  | EX  | MEM | WB  |     |     |     |     |     |      |        | IF      | ID     | EX     | MEM    | WB      |
| addi x19 x19 4               |     |     | IF  | ID  | EX  | MEM | WB  |     |     |     |     |      |        |         | IF     | ID     | EX     | MEM     |
| bne x20 x9 -32 <loop></loop> |     |     |     | IF  | ID  | EX  | MEM | WB  |     |     |     |      |        |         |        | IF     | ID     | EX      |
| addi x17 x0 10               |     |     |     |     | IF  | ID  | EX  | MEM | WB  |     |     |      |        |         |        |        | IF     | ID      |
| addi x22 x22 1               |     |     |     |     |     | IF  | ID  | EX  | MEM | WB  |     |      |        |         |        |        |        | IF      |
| addi x22 x22 -1              |     |     |     |     |     |     | IF  | ID  | EX  | MEM | WB  |      |        |         |        |        |        |         |