Skip to content

Commit 58f9787

Browse files
paulsc96colluca
authored andcommitted
sw: Add SARIS kernels (#124)
* hw: Keep IO fixed regardless of configuration * target/snitch_cluster: Add Occamy-like config with SSSRs * sw: Add SARIS kernels * sw/saris: Fix license headers * sw/saris: Fix python lint * lint: Do not C++ lint SARIS sources * sw/saris: Remove stub LLVM from makefile * sw/saris: Add README.md * sw/saris: Initialize putchar buffer, fix F extension skip * sw/saris: Switch to, adapt default config, add bib placeholders
1 parent 648e141 commit 58f9787

File tree

21 files changed

+3017
-18
lines changed

21 files changed

+3017
-18
lines changed

.github/workflows/lint.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ jobs:
129129
- uses: actions/checkout@v3
130130
- uses: DoozyX/clang-format-lint-action@v0.16.2
131131
with:
132+
exclude: './sw/saris'
132133
clangFormatVersion: 10
133134

134135
######################

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,3 +161,21 @@ If you use the Snitch cluster or its extensions in your work, you can cite us:
161161
```
162162

163163
</p>
164+
165+
<details>
166+
<summary><b>SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers</b></summary>
167+
<p>
168+
169+
```
170+
@misc{scheffler2024saris,
171+
title={SARIS: Accelerating Stencil Computations on Energy-Efficient
172+
RISC-V Compute Clusters with Indirect Stream Registers},
173+
author={Paul Scheffler and Luca Colagrande and Luca Benini},
174+
year={2024},
175+
eprint={},
176+
archivePrefix={arXiv},
177+
primaryClass={cs.MS}
178+
}
179+
```
180+
181+
</p>

docs/publications.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,4 +118,22 @@ If you use the Snitch cluster or its extensions in your work, you can cite us:
118118

119119
</p>
120120

121+
<details>
122+
<summary><b>SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers</b></summary>
123+
<p>
124+
125+
```
126+
@misc{scheffler2024saris,
127+
title={SARIS: Accelerating Stencil Computations on Energy-Efficient
128+
RISC-V Compute Clusters with Indirect Stream Registers},
129+
author={Paul Scheffler and Luca Colagrande and Luca Benini},
130+
year={2024},
131+
eprint={},
132+
archivePrefix={arXiv},
133+
primaryClass={cs.MS}
134+
}
135+
```
136+
137+
</p>
138+
121139
<!--end-publications-->

hw/snitch_cluster/src/snitch_cluster_wrapper.sv.tpl

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -210,28 +210,24 @@ ${ssr_cfg(core, "'{{{indirection:d}, {isect_master:d}, {isect_master_idx:d}, {is
210210
${ssr_cfg(core, '{reg_idx}', '/*None*/ 0', ',')}\
211211
};
212212

213+
// Forward potentially optional configuration parameters
214+
localparam logic [9:0] CfgBaseHartId = (${to_sv_hex(cfg['cluster_base_hartid'], 10)});
215+
localparam addr_t CfgClusterBaseAddr = (${to_sv_hex(cfg['cluster_base_addr'], cfg['addr_width'])});
216+
213217
endpackage
214218
// verilog_lint: waive-stop package-filename
215219

216220
module ${cfg['name']}_wrapper (
217221
input logic clk_i,
218222
input logic rst_ni,
219-
% if cfg['enable_debug']:
220223
input logic [${cfg['pkg_name']}::NrCores-1:0] debug_req_i,
221-
% endif
222224
input logic [${cfg['pkg_name']}::NrCores-1:0] meip_i,
223225
input logic [${cfg['pkg_name']}::NrCores-1:0] mtip_i,
224226
input logic [${cfg['pkg_name']}::NrCores-1:0] msip_i,
225-
% if cfg['cluster_base_expose']:
226227
input logic [9:0] hart_base_id_i,
227228
input logic [${cfg['addr_width']-1}:0] cluster_base_addr_i,
228-
% endif
229-
% if cfg['timing']['iso_crossings']:
230229
input logic clk_d2_bypass_i,
231-
% endif
232-
% if cfg['sram_cfg_expose']:
233230
input ${cfg['pkg_name']}::sram_cfgs_t sram_cfgs_i,
234-
%endif
235231
input ${cfg['pkg_name']}::narrow_in_req_t narrow_in_req_i,
236232
output ${cfg['pkg_name']}::narrow_in_resp_t narrow_in_resp_o,
237233
output ${cfg['pkg_name']}::narrow_out_req_t narrow_out_req_o,
@@ -354,8 +350,8 @@ module ${cfg['name']}_wrapper (
354350
.hart_base_id_i,
355351
.cluster_base_addr_i,
356352
% else:
357-
.hart_base_id_i (${to_sv_hex(cfg['cluster_base_hartid'], 10)}),
358-
.cluster_base_addr_i (${to_sv_hex(cfg['cluster_base_addr'], cfg['addr_width'])}),
353+
.hart_base_id_i (snitch_cluster_pkg::CfgBaseHartId),
354+
.cluster_base_addr_i (snitch_cluster_pkg::CfgClusterBaseAddr),
359355
% endif
360356
% if cfg['timing']['iso_crossings']:
361357
.clk_d2_bypass_i,

sw/saris/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
bin
2+
dump
3+
gen

sw/saris/Makefile

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Copyright 2023 ETH Zurich and University of Bologna.
2+
# Licensed under the Apache License, Version 2.0, see LICENSE for details.
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
# Paul Scheffler <paulsc@iis.ee.ethz.ch>
6+
# Luca Colagrande <colluca@iis.ee.ethz.ch>
7+
8+
all:
9+
10+
###############
11+
# Environment #
12+
###############
13+
14+
# NOTE: the LLVM_BINROOT environment variable must point to a specific revision of PULP RISCV
15+
# LLVM 15 (see README.md). After compilation, you can set LLVM_BINROOT in your environment, this
16+
# makefile, or pass it on invocation of `make`.
17+
ifndef LLVM_BINROOT
18+
$(error LLVM_BINROOT is not set; please compile the SARIS version of LLVM 15 (see README.md) and set LLVM_BINROOT to its binary location.)
19+
endif
20+
21+
PYTHON3 ?= python3
22+
23+
SARISDIR ?= .
24+
GENDIR ?= $(SARISDIR)/gen
25+
UTILDIR ?= $(SARISDIR)/util
26+
BINDIR ?= $(SARISDIR)/bin
27+
DUMPDIR ?= $(SARISDIR)/dump
28+
RTDIR ?= $(SARISDIR)/runtime
29+
30+
# We depend on the printf submodule
31+
PRINTFDIR ?= $(SARISDIR)/../deps/printf
32+
33+
############################
34+
# Compiler (LLVM 15) Setup #
35+
############################
36+
37+
RISCV_MARCH ?= \
38+
rv32imafd_zfh_xfrep_xssr_xdma_xfalthalf_xfquarter_xfaltquarter_xfvecsingle_xfvechalf_$\
39+
xfvecalthalf_xfvecquarter_xfvecaltquarter_xfauxhalf_xfauxalthalf_xfauxquarter_xfauxaltquarter_$\
40+
xfauxvecsingle_xfauxvechalf_xfauxvecalthalf_xfauxvecquarter_xfauxvecaltquarter_xfexpauxvechalf_$\
41+
xfexpauxvecalthalf_xfexpauxvecquarter_xfexpauxvecaltquarter
42+
43+
RISCV_MABI ?= ilp32d
44+
45+
RISCV_CC ?= $(LLVM_BINROOT)/clang
46+
RISCV_CXX ?= $(LLVM_BINROOT)/clang++
47+
RISCV_OBJDUMP ?= $(LLVM_BINROOT)/llvm-objdump
48+
RISCV_STRIP ?= $(LLVM_BINROOT)/llvm-strip
49+
50+
RISCV_STACK ?= 2048
51+
RISCV_FLAGS ?= -mcpu=snitch -march=$(RISCV_MARCH) -Ofast -flto -mabi=$(RISCV_MABI) \
52+
-Wframe-larger-than=$(RISCV_STACK) -nostdlib -mcmodel=medany -I$(RTDIR) \
53+
-I$(SARISDIR)/stencils -I$(PRINTFDIR) -ffreestanding -fno-builtin \
54+
-ffunction-sections
55+
56+
RISCV_CFLAGS ?= $(RISCV_FLAGS)
57+
# Loop unrolling optimization
58+
RISCV_CFLAGS += -mllvm --allow-unroll-and-jam
59+
RISCV_CFLAGS += -mllvm --unroll-allow-partial
60+
RISCV_CFLAGS += -mllvm --unroll-runtime
61+
# Tree height reduction options
62+
RISCV_CFLAGS += -mllvm --enable-fp-thr
63+
RISCV_CFLAGS += -mllvm --thr-max-depth=5
64+
RISCV_CFLAGS += -mllvm --thr-se-leaves
65+
RISCV_CFLAGS += -mllvm --thr-fuse-bias
66+
RISCV_CFLAGS += -mllvm --thr-se-factor=2
67+
RISCV_CFLAGS += -mllvm --thr-re-factor=1
68+
# Machine scheduler and PostRA options
69+
RISCV_CFLAGS += -mllvm --post-RA-scheduler
70+
RISCV_CFLAGS += -mllvm --enable-misched
71+
RISCV_CFLAGS += -mllvm --enable-post-misched
72+
RISCV_CFLAGS += -mllvm --misched-postra
73+
74+
RISCV_CCFLAGS ?= $(RISCV_CFLAGS) -std=gnu11
75+
RISCV_CXXFLAGS ?= $(RISCV_CFLAGS) -std=gnu++14
76+
RISCV_LDFLAGS ?= -fuse-ld=$(LLVM_BINROOT)/ld.lld -flto -static -lm $(RISCV_FLAGS) \
77+
-Wl,--fatal-warnings -Wl,-z,stack-size=$(RISCV_STACK)
78+
RISCV_DMPFLAGS ?= --mcpu=snitch
79+
80+
############################
81+
# SARIS Program Build Flow #
82+
############################
83+
84+
.SECONDEXPANSION:
85+
.DELETE_ON_ERROR:
86+
87+
# Extracting word nr. $(1) from $(2)-separated list $(3)
88+
pw = $(word $(1), $(subst $(2), ,$(3)))
89+
90+
$(GENDIR) $(BINDIR) $(DUMPDIR):
91+
mkdir -p $@
92+
93+
$(BINDIR)/crt0.o: $(SARISDIR)/runtime/crt0.S | $(BINDIR)
94+
$(RISCV_CC) $(RISCV_CCFLAGS) -c $< -o $@
95+
96+
$(BINDIR)/istc.%.c.o: $(GENDIR)/$$(call pw,1,.,$$*).cpp | $(BINDIR)
97+
$(RISCV_CXX) $(RISCV_CXXFLAGS) -c $< -o $@
98+
99+
.PRECIOUS: $(BINDIR)/%.elf
100+
$(BINDIR)/istc.%.elf: $(BINDIR)/istc.%.c.o $(BINDIR)/crt0.o $(RTDIR)/link.ld | $(BINDIR)
101+
$(RISCV_CC) $(RISCV_LDFLAGS) -o $@ $< $(BINDIR)/crt0.o -T$(RTDIR)/link.ld
102+
$(RISCV_STRIP) $@ -g -S -d --strip-debug -R .comment -R .riscv.attributes
103+
104+
.PRECIOUS: $(DUMPDIR)/%.dump
105+
$(DUMPDIR)/%.dump: $(BINDIR)/%.elf | $(DUMPDIR)
106+
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .text -d $< >$@
107+
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .misc -s $< | tail -n +3 >>$@
108+
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .tcdm -s $< | tail -n +3 >>$@
109+
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .tcdmc -s $< | tail -n +3 >>$@
110+
111+
# Phony for program and dump build
112+
prog.%: $(BINDIR)/%.elf $(DUMPDIR)/%.dump
113+
@echo -e '\x1b[44;33;1mBUILT: $*\x1b[0m'
114+
115+
clean:
116+
rm -rf $(BINDIR) $(DUMPDIR) $(GENDIR)
117+
118+
############################
119+
# SARIS Program Generation #
120+
############################
121+
122+
.PRECIOUS: $(GENDIR)/%.cpp
123+
$(GENDIR)/%.cpp: $(UTILDIR)/evalgen.py $(SARISDIR)/eval.json $(UTILDIR)/eval.cpp.tpl | $(GENDIR)
124+
$(PYTHON3) $^ $* > $@
125+
126+
EVAL_NAMES ?= $(shell jq -r 'keys | join(" ")' $(SARISDIR)/eval.json)
127+
ISTC_PROGS += $(patsubst %,istc.%,$(EVAL_NAMES))
128+
129+
# Default: compile all SARIS programs in eval.json
130+
all: $(addprefix prog.,$(ISTC_PROGS))

sw/saris/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# SARIS Stencil Kernels
2+
3+
This directory contains the baseline- and SSSR-accelerated Snitch cluster stencil kernels used in the evaluation section of the paper _"SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers"_. In our paper, we describe how indirect stream register architectures such as SSSRs can significantly accelerate stencil codes.
4+
5+
If you use our code or compare against our work, please cite us:
6+
7+
```
8+
@misc{scheffler2024saris,
9+
title={SARIS: Accelerating Stencil Computations on Energy-Efficient
10+
RISC-V Compute Clusters with Indirect Stream Registers},
11+
author={Paul Scheffler and Luca Colagrande and Luca Benini},
12+
year={2024},
13+
eprint={},
14+
archivePrefix={arXiv},
15+
primaryClass={cs.MS}
16+
}
17+
```
18+
19+
> [!IMPORTANT]
20+
> - Unlike other software in this repository, compiling this code requires a **custom version of the LLVM 15 toolchain** with some extensions and improvements. The source code for this LLVM fork can be found [here](https://github.com/pulp-platform/llvm-project/tree/15.0.0-saris-0.1.0).
21+
> - The generated example programs are only intended to be used **in RTL simulation of a default, SSSR-extended cluster**, using the cluster configuration `cfg/default.hjson`.
22+
23+
## Directory Structure
24+
25+
* `stencils/`: Baseline (`istc.par.hpp`) and SARIS-accelerated (`istc.issr.hpp`) stencil codes.
26+
* `runtime/`: Additional runtime code and linking configuration needed for compilation.
27+
* `util/`: Evaluation program generator supporting different grid sizes and kernel calls.
28+
* `eval.json`: Configuration for test program generator.
29+
30+
## Compile Evaluation Programs
31+
32+
Before you can compile test problems, you need the [SARIS LLVM 15 toolchain](https://github.com/pulp-platform/llvm-project/tree/15.0.0-saris-0.1.0) along with `newlib` and `compiler-rt`. The required build steps are outlined [here](https://github.com/pulp-platform/llvm-toolchain-cd/blob/main/README.md).
33+
34+
Then, you can build the test programs specified in `eval.json` by running:
35+
36+
```
37+
make LLVM_BINROOT=<llvm_install_path>/bin all
38+
```
39+
40+
By default, `eval.json` specifies RV32G and SSSR-accelerated test programs for all included stencils as specified in our paper. Binaries are generated in `bin/` and disassembled program dumps in `dump/`.
41+
42+
43+
## Run Evaluation Programs
44+
45+
Evaluation programs can only be run in RTL simulation of a Snitch cluster using the default, SSSR-enhanced configuration `cfg/default.json`. For example, when building a QuestaSim RTL simulation setup from `target/snitch_cluster`:
46+
47+
```
48+
make CFG_OVERRIDE=cfg/default.hjson bin/snitch_cluster.vsim
49+
```
50+
51+
Then, the built evaluation programs can be run on this simulation setup as usual, for example:
52+
53+
```
54+
bin/snitch_cluster.vsim ../../sw/saris/bin/istc.pb_jacobi_2d_ml_issr.elf
55+
```
56+
57+
Performance metrics can be analyzed using the annotating Snitch tracer (`make traces`). In the default evaluation programs, the section of interest is section 2.

0 commit comments

Comments
 (0)