Skip to content

sw: Add SARIS kernels #124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ jobs:
- uses: actions/checkout@v3
- uses: DoozyX/clang-format-lint-action@v0.16.2
with:
exclude: './sw/saris'
clangFormatVersion: 10

######################
Expand Down
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,3 +161,21 @@ If you use the Snitch cluster or its extensions in your work, you can cite us:
```

</p>

<details>
<summary><b>SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers</b></summary>
<p>

```
@misc{scheffler2024saris,
title={SARIS: Accelerating Stencil Computations on Energy-Efficient
RISC-V Compute Clusters with Indirect Stream Registers},
author={Paul Scheffler and Luca Colagrande and Luca Benini},
year={2024},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.MS}
}
```

</p>
18 changes: 18 additions & 0 deletions docs/publications.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,4 +118,22 @@ If you use the Snitch cluster or its extensions in your work, you can cite us:

</p>

<details>
<summary><b>SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers</b></summary>
<p>

```
@misc{scheffler2024saris,
title={SARIS: Accelerating Stencil Computations on Energy-Efficient
RISC-V Compute Clusters with Indirect Stream Registers},
author={Paul Scheffler and Luca Colagrande and Luca Benini},
year={2024},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.MS}
}
```

</p>

<!--end-publications-->
16 changes: 6 additions & 10 deletions hw/snitch_cluster/src/snitch_cluster_wrapper.sv.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -210,28 +210,24 @@ ${ssr_cfg(core, "'{{{indirection:d}, {isect_master:d}, {isect_master_idx:d}, {is
${ssr_cfg(core, '{reg_idx}', '/*None*/ 0', ',')}\
};

// Forward potentially optional configuration parameters
localparam logic [9:0] CfgBaseHartId = (${to_sv_hex(cfg['cluster_base_hartid'], 10)});
localparam addr_t CfgClusterBaseAddr = (${to_sv_hex(cfg['cluster_base_addr'], cfg['addr_width'])});

endpackage
// verilog_lint: waive-stop package-filename

module ${cfg['name']}_wrapper (
input logic clk_i,
input logic rst_ni,
% if cfg['enable_debug']:
input logic [${cfg['pkg_name']}::NrCores-1:0] debug_req_i,
% endif
input logic [${cfg['pkg_name']}::NrCores-1:0] meip_i,
input logic [${cfg['pkg_name']}::NrCores-1:0] mtip_i,
input logic [${cfg['pkg_name']}::NrCores-1:0] msip_i,
% if cfg['cluster_base_expose']:
input logic [9:0] hart_base_id_i,
input logic [${cfg['addr_width']-1}:0] cluster_base_addr_i,
% endif
% if cfg['timing']['iso_crossings']:
input logic clk_d2_bypass_i,
% endif
% if cfg['sram_cfg_expose']:
input ${cfg['pkg_name']}::sram_cfgs_t sram_cfgs_i,
%endif
input ${cfg['pkg_name']}::narrow_in_req_t narrow_in_req_i,
output ${cfg['pkg_name']}::narrow_in_resp_t narrow_in_resp_o,
output ${cfg['pkg_name']}::narrow_out_req_t narrow_out_req_o,
Expand Down Expand Up @@ -354,8 +350,8 @@ module ${cfg['name']}_wrapper (
.hart_base_id_i,
.cluster_base_addr_i,
% else:
.hart_base_id_i (${to_sv_hex(cfg['cluster_base_hartid'], 10)}),
.cluster_base_addr_i (${to_sv_hex(cfg['cluster_base_addr'], cfg['addr_width'])}),
.hart_base_id_i (snitch_cluster_pkg::CfgBaseHartId),
.cluster_base_addr_i (snitch_cluster_pkg::CfgClusterBaseAddr),
% endif
% if cfg['timing']['iso_crossings']:
.clk_d2_bypass_i,
Expand Down
3 changes: 3 additions & 0 deletions sw/saris/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
bin
dump
gen
130 changes: 130 additions & 0 deletions sw/saris/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Copyright 2023 ETH Zurich and University of Bologna.
# Licensed under the Apache License, Version 2.0, see LICENSE for details.
# SPDX-License-Identifier: Apache-2.0

# Paul Scheffler <paulsc@iis.ee.ethz.ch>
# Luca Colagrande <colluca@iis.ee.ethz.ch>

all:

###############
# Environment #
###############

# NOTE: the LLVM_BINROOT environment variable must point to a specific revision of PULP RISCV
# LLVM 15 (see README.md). After compilation, you can set LLVM_BINROOT in your environment, this
# makefile, or pass it on invocation of `make`.
ifndef LLVM_BINROOT
$(error LLVM_BINROOT is not set; please compile the SARIS version of LLVM 15 (see README.md) and set LLVM_BINROOT to its binary location.)
endif

PYTHON3 ?= python3

SARISDIR ?= .
GENDIR ?= $(SARISDIR)/gen
UTILDIR ?= $(SARISDIR)/util
BINDIR ?= $(SARISDIR)/bin
DUMPDIR ?= $(SARISDIR)/dump
RTDIR ?= $(SARISDIR)/runtime

# We depend on the printf submodule
PRINTFDIR ?= $(SARISDIR)/../deps/printf

############################
# Compiler (LLVM 15) Setup #
############################

RISCV_MARCH ?= \
rv32imafd_zfh_xfrep_xssr_xdma_xfalthalf_xfquarter_xfaltquarter_xfvecsingle_xfvechalf_$\
xfvecalthalf_xfvecquarter_xfvecaltquarter_xfauxhalf_xfauxalthalf_xfauxquarter_xfauxaltquarter_$\
xfauxvecsingle_xfauxvechalf_xfauxvecalthalf_xfauxvecquarter_xfauxvecaltquarter_xfexpauxvechalf_$\
xfexpauxvecalthalf_xfexpauxvecquarter_xfexpauxvecaltquarter

RISCV_MABI ?= ilp32d

RISCV_CC ?= $(LLVM_BINROOT)/clang
RISCV_CXX ?= $(LLVM_BINROOT)/clang++
RISCV_OBJDUMP ?= $(LLVM_BINROOT)/llvm-objdump
RISCV_STRIP ?= $(LLVM_BINROOT)/llvm-strip

RISCV_STACK ?= 2048
RISCV_FLAGS ?= -mcpu=snitch -march=$(RISCV_MARCH) -Ofast -flto -mabi=$(RISCV_MABI) \
-Wframe-larger-than=$(RISCV_STACK) -nostdlib -mcmodel=medany -I$(RTDIR) \
-I$(SARISDIR)/stencils -I$(PRINTFDIR) -ffreestanding -fno-builtin \
-ffunction-sections

RISCV_CFLAGS ?= $(RISCV_FLAGS)
# Loop unrolling optimization
RISCV_CFLAGS += -mllvm --allow-unroll-and-jam
RISCV_CFLAGS += -mllvm --unroll-allow-partial
RISCV_CFLAGS += -mllvm --unroll-runtime
# Tree height reduction options
RISCV_CFLAGS += -mllvm --enable-fp-thr
RISCV_CFLAGS += -mllvm --thr-max-depth=5
RISCV_CFLAGS += -mllvm --thr-se-leaves
RISCV_CFLAGS += -mllvm --thr-fuse-bias
RISCV_CFLAGS += -mllvm --thr-se-factor=2
RISCV_CFLAGS += -mllvm --thr-re-factor=1
# Machine scheduler and PostRA options
RISCV_CFLAGS += -mllvm --post-RA-scheduler
RISCV_CFLAGS += -mllvm --enable-misched
RISCV_CFLAGS += -mllvm --enable-post-misched
RISCV_CFLAGS += -mllvm --misched-postra

RISCV_CCFLAGS ?= $(RISCV_CFLAGS) -std=gnu11
RISCV_CXXFLAGS ?= $(RISCV_CFLAGS) -std=gnu++14
RISCV_LDFLAGS ?= -fuse-ld=$(LLVM_BINROOT)/ld.lld -flto -static -lm $(RISCV_FLAGS) \
-Wl,--fatal-warnings -Wl,-z,stack-size=$(RISCV_STACK)
RISCV_DMPFLAGS ?= --mcpu=snitch

############################
# SARIS Program Build Flow #
############################

.SECONDEXPANSION:
.DELETE_ON_ERROR:

# Extracting word nr. $(1) from $(2)-separated list $(3)
pw = $(word $(1), $(subst $(2), ,$(3)))

$(GENDIR) $(BINDIR) $(DUMPDIR):
mkdir -p $@

$(BINDIR)/crt0.o: $(SARISDIR)/runtime/crt0.S | $(BINDIR)
$(RISCV_CC) $(RISCV_CCFLAGS) -c $< -o $@

$(BINDIR)/istc.%.c.o: $(GENDIR)/$$(call pw,1,.,$$*).cpp | $(BINDIR)
$(RISCV_CXX) $(RISCV_CXXFLAGS) -c $< -o $@

.PRECIOUS: $(BINDIR)/%.elf
$(BINDIR)/istc.%.elf: $(BINDIR)/istc.%.c.o $(BINDIR)/crt0.o $(RTDIR)/link.ld | $(BINDIR)
$(RISCV_CC) $(RISCV_LDFLAGS) -o $@ $< $(BINDIR)/crt0.o -T$(RTDIR)/link.ld
$(RISCV_STRIP) $@ -g -S -d --strip-debug -R .comment -R .riscv.attributes

.PRECIOUS: $(DUMPDIR)/%.dump
$(DUMPDIR)/%.dump: $(BINDIR)/%.elf | $(DUMPDIR)
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .text -d $< >$@
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .misc -s $< | tail -n +3 >>$@
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .tcdm -s $< | tail -n +3 >>$@
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .tcdmc -s $< | tail -n +3 >>$@

# Phony for program and dump build
prog.%: $(BINDIR)/%.elf $(DUMPDIR)/%.dump
@echo -e '\x1b[44;33;1mBUILT: $*\x1b[0m'

clean:
rm -rf $(BINDIR) $(DUMPDIR) $(GENDIR)

############################
# SARIS Program Generation #
############################

.PRECIOUS: $(GENDIR)/%.cpp
$(GENDIR)/%.cpp: $(UTILDIR)/evalgen.py $(SARISDIR)/eval.json $(UTILDIR)/eval.cpp.tpl | $(GENDIR)
$(PYTHON3) $^ $* > $@

EVAL_NAMES ?= $(shell jq -r 'keys | join(" ")' $(SARISDIR)/eval.json)
ISTC_PROGS += $(patsubst %,istc.%,$(EVAL_NAMES))

# Default: compile all SARIS programs in eval.json
all: $(addprefix prog.,$(ISTC_PROGS))
57 changes: 57 additions & 0 deletions sw/saris/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# SARIS Stencil Kernels

This directory contains the baseline- and SSSR-accelerated Snitch cluster stencil kernels used in the evaluation section of the paper _"SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers"_. In our paper, we describe how indirect stream register architectures such as SSSRs can significantly accelerate stencil codes.

If you use our code or compare against our work, please cite us:

```
@misc{scheffler2024saris,
title={SARIS: Accelerating Stencil Computations on Energy-Efficient
RISC-V Compute Clusters with Indirect Stream Registers},
author={Paul Scheffler and Luca Colagrande and Luca Benini},
year={2024},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.MS}
}
```

> [!IMPORTANT]
> - Unlike other software in this repository, compiling this code requires a **custom version of the LLVM 15 toolchain** with some extensions and improvements. The source code for this LLVM fork can be found [here](https://github.com/pulp-platform/llvm-project/tree/15.0.0-saris-0.1.0).
> - The generated example programs are only intended to be used **in RTL simulation of a default, SSSR-extended cluster**, using the cluster configuration `cfg/default.hjson`.

## Directory Structure

* `stencils/`: Baseline (`istc.par.hpp`) and SARIS-accelerated (`istc.issr.hpp`) stencil codes.
* `runtime/`: Additional runtime code and linking configuration needed for compilation.
* `util/`: Evaluation program generator supporting different grid sizes and kernel calls.
* `eval.json`: Configuration for test program generator.

## Compile Evaluation Programs

Before you can compile test problems, you need the [SARIS LLVM 15 toolchain](https://github.com/pulp-platform/llvm-project/tree/15.0.0-saris-0.1.0) along with `newlib` and `compiler-rt`. The required build steps are outlined [here](https://github.com/pulp-platform/llvm-toolchain-cd/blob/main/README.md).

Then, you can build the test programs specified in `eval.json` by running:

```
make LLVM_BINROOT=<llvm_install_path>/bin all
```

By default, `eval.json` specifies RV32G and SSSR-accelerated test programs for all included stencils as specified in our paper. Binaries are generated in `bin/` and disassembled program dumps in `dump/`.


## Run Evaluation Programs

Evaluation programs can only be run in RTL simulation of a Snitch cluster using the default, SSSR-enhanced configuration `cfg/default.json`. For example, when building a QuestaSim RTL simulation setup from `target/snitch_cluster`:

```
make CFG_OVERRIDE=cfg/default.hjson bin/snitch_cluster.vsim
```

Then, the built evaluation programs can be run on this simulation setup as usual, for example:

```
bin/snitch_cluster.vsim ../../sw/saris/bin/istc.pb_jacobi_2d_ml_issr.elf
```

Performance metrics can be analyzed using the annotating Snitch tracer (`make traces`). In the default evaluation programs, the section of interest is section 2.
Loading