-
Notifications
You must be signed in to change notification settings - Fork 72
sw: Add SARIS kernels #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
7bdd312
hw: Keep IO fixed regardless of configuration
paulsc96 ecdc465
target/snitch_cluster: Add Occamy-like config with SSSRs
paulsc96 9629cb9
sw: Add SARIS kernels
paulsc96 e73ef43
sw/saris: Fix license headers
paulsc96 d41cd4e
sw/saris: Fix python lint
paulsc96 a62d6e4
lint: Do not C++ lint SARIS sources
paulsc96 31aa679
sw/saris: Remove stub LLVM from makefile
paulsc96 2050a2a
sw/saris: Add README.md
paulsc96 ab4fe30
sw/saris: Initialize putchar buffer, fix F extension skip
paulsc96 ea40640
sw/saris: Switch to, adapt default config, add bib placeholders
paulsc96 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
bin | ||
dump | ||
gen |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
# Copyright 2023 ETH Zurich and University of Bologna. | ||
# Licensed under the Apache License, Version 2.0, see LICENSE for details. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
# Paul Scheffler <paulsc@iis.ee.ethz.ch> | ||
# Luca Colagrande <colluca@iis.ee.ethz.ch> | ||
|
||
all: | ||
|
||
############### | ||
# Environment # | ||
############### | ||
|
||
# NOTE: the LLVM_BINROOT environment variable must point to a specific revision of PULP RISCV | ||
# LLVM 15 (see README.md). After compilation, you can set LLVM_BINROOT in your environment, this | ||
# makefile, or pass it on invocation of `make`. | ||
ifndef LLVM_BINROOT | ||
$(error LLVM_BINROOT is not set; please compile the SARIS version of LLVM 15 (see README.md) and set LLVM_BINROOT to its binary location.) | ||
endif | ||
|
||
PYTHON3 ?= python3 | ||
|
||
SARISDIR ?= . | ||
GENDIR ?= $(SARISDIR)/gen | ||
UTILDIR ?= $(SARISDIR)/util | ||
BINDIR ?= $(SARISDIR)/bin | ||
DUMPDIR ?= $(SARISDIR)/dump | ||
RTDIR ?= $(SARISDIR)/runtime | ||
|
||
# We depend on the printf submodule | ||
PRINTFDIR ?= $(SARISDIR)/../deps/printf | ||
|
||
############################ | ||
# Compiler (LLVM 15) Setup # | ||
############################ | ||
|
||
RISCV_MARCH ?= \ | ||
rv32imafd_zfh_xfrep_xssr_xdma_xfalthalf_xfquarter_xfaltquarter_xfvecsingle_xfvechalf_$\ | ||
xfvecalthalf_xfvecquarter_xfvecaltquarter_xfauxhalf_xfauxalthalf_xfauxquarter_xfauxaltquarter_$\ | ||
xfauxvecsingle_xfauxvechalf_xfauxvecalthalf_xfauxvecquarter_xfauxvecaltquarter_xfexpauxvechalf_$\ | ||
xfexpauxvecalthalf_xfexpauxvecquarter_xfexpauxvecaltquarter | ||
|
||
RISCV_MABI ?= ilp32d | ||
|
||
RISCV_CC ?= $(LLVM_BINROOT)/clang | ||
RISCV_CXX ?= $(LLVM_BINROOT)/clang++ | ||
RISCV_OBJDUMP ?= $(LLVM_BINROOT)/llvm-objdump | ||
RISCV_STRIP ?= $(LLVM_BINROOT)/llvm-strip | ||
|
||
RISCV_STACK ?= 2048 | ||
RISCV_FLAGS ?= -mcpu=snitch -march=$(RISCV_MARCH) -Ofast -flto -mabi=$(RISCV_MABI) \ | ||
-Wframe-larger-than=$(RISCV_STACK) -nostdlib -mcmodel=medany -I$(RTDIR) \ | ||
-I$(SARISDIR)/stencils -I$(PRINTFDIR) -ffreestanding -fno-builtin \ | ||
-ffunction-sections | ||
|
||
RISCV_CFLAGS ?= $(RISCV_FLAGS) | ||
# Loop unrolling optimization | ||
RISCV_CFLAGS += -mllvm --allow-unroll-and-jam | ||
RISCV_CFLAGS += -mllvm --unroll-allow-partial | ||
RISCV_CFLAGS += -mllvm --unroll-runtime | ||
# Tree height reduction options | ||
RISCV_CFLAGS += -mllvm --enable-fp-thr | ||
RISCV_CFLAGS += -mllvm --thr-max-depth=5 | ||
RISCV_CFLAGS += -mllvm --thr-se-leaves | ||
RISCV_CFLAGS += -mllvm --thr-fuse-bias | ||
RISCV_CFLAGS += -mllvm --thr-se-factor=2 | ||
RISCV_CFLAGS += -mllvm --thr-re-factor=1 | ||
# Machine scheduler and PostRA options | ||
RISCV_CFLAGS += -mllvm --post-RA-scheduler | ||
RISCV_CFLAGS += -mllvm --enable-misched | ||
RISCV_CFLAGS += -mllvm --enable-post-misched | ||
RISCV_CFLAGS += -mllvm --misched-postra | ||
|
||
RISCV_CCFLAGS ?= $(RISCV_CFLAGS) -std=gnu11 | ||
RISCV_CXXFLAGS ?= $(RISCV_CFLAGS) -std=gnu++14 | ||
RISCV_LDFLAGS ?= -fuse-ld=$(LLVM_BINROOT)/ld.lld -flto -static -lm $(RISCV_FLAGS) \ | ||
-Wl,--fatal-warnings -Wl,-z,stack-size=$(RISCV_STACK) | ||
RISCV_DMPFLAGS ?= --mcpu=snitch | ||
|
||
############################ | ||
# SARIS Program Build Flow # | ||
############################ | ||
|
||
.SECONDEXPANSION: | ||
.DELETE_ON_ERROR: | ||
|
||
# Extracting word nr. $(1) from $(2)-separated list $(3) | ||
pw = $(word $(1), $(subst $(2), ,$(3))) | ||
|
||
$(GENDIR) $(BINDIR) $(DUMPDIR): | ||
mkdir -p $@ | ||
|
||
$(BINDIR)/crt0.o: $(SARISDIR)/runtime/crt0.S | $(BINDIR) | ||
$(RISCV_CC) $(RISCV_CCFLAGS) -c $< -o $@ | ||
|
||
$(BINDIR)/istc.%.c.o: $(GENDIR)/$$(call pw,1,.,$$*).cpp | $(BINDIR) | ||
$(RISCV_CXX) $(RISCV_CXXFLAGS) -c $< -o $@ | ||
|
||
.PRECIOUS: $(BINDIR)/%.elf | ||
$(BINDIR)/istc.%.elf: $(BINDIR)/istc.%.c.o $(BINDIR)/crt0.o $(RTDIR)/link.ld | $(BINDIR) | ||
$(RISCV_CC) $(RISCV_LDFLAGS) -o $@ $< $(BINDIR)/crt0.o -T$(RTDIR)/link.ld | ||
$(RISCV_STRIP) $@ -g -S -d --strip-debug -R .comment -R .riscv.attributes | ||
|
||
.PRECIOUS: $(DUMPDIR)/%.dump | ||
$(DUMPDIR)/%.dump: $(BINDIR)/%.elf | $(DUMPDIR) | ||
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .text -d $< >$@ | ||
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .misc -s $< | tail -n +3 >>$@ | ||
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .tcdm -s $< | tail -n +3 >>$@ | ||
@$(RISCV_OBJDUMP) $(RISCV_DMPFLAGS) -j .tcdmc -s $< | tail -n +3 >>$@ | ||
|
||
# Phony for program and dump build | ||
prog.%: $(BINDIR)/%.elf $(DUMPDIR)/%.dump | ||
@echo -e '\x1b[44;33;1mBUILT: $*\x1b[0m' | ||
|
||
clean: | ||
rm -rf $(BINDIR) $(DUMPDIR) $(GENDIR) | ||
|
||
############################ | ||
# SARIS Program Generation # | ||
############################ | ||
|
||
.PRECIOUS: $(GENDIR)/%.cpp | ||
$(GENDIR)/%.cpp: $(UTILDIR)/evalgen.py $(SARISDIR)/eval.json $(UTILDIR)/eval.cpp.tpl | $(GENDIR) | ||
$(PYTHON3) $^ $* > $@ | ||
|
||
EVAL_NAMES ?= $(shell jq -r 'keys | join(" ")' $(SARISDIR)/eval.json) | ||
ISTC_PROGS += $(patsubst %,istc.%,$(EVAL_NAMES)) | ||
|
||
# Default: compile all SARIS programs in eval.json | ||
all: $(addprefix prog.,$(ISTC_PROGS)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# SARIS Stencil Kernels | ||
|
||
This directory contains the baseline- and SSSR-accelerated Snitch cluster stencil kernels used in the evaluation section of the paper _"SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers"_. In our paper, we describe how indirect stream register architectures such as SSSRs can significantly accelerate stencil codes. | ||
|
||
If you use our code or compare against our work, please cite us: | ||
|
||
``` | ||
@misc{scheffler2024saris, | ||
title={SARIS: Accelerating Stencil Computations on Energy-Efficient | ||
RISC-V Compute Clusters with Indirect Stream Registers}, | ||
author={Paul Scheffler and Luca Colagrande and Luca Benini}, | ||
year={2024}, | ||
eprint={}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.MS} | ||
} | ||
``` | ||
|
||
> [!IMPORTANT] | ||
> - Unlike other software in this repository, compiling this code requires a **custom version of the LLVM 15 toolchain** with some extensions and improvements. The source code for this LLVM fork can be found [here](https://github.com/pulp-platform/llvm-project/tree/15.0.0-saris-0.1.0). | ||
> - The generated example programs are only intended to be used **in RTL simulation of a default, SSSR-extended cluster**, using the cluster configuration `cfg/default.hjson`. | ||
|
||
## Directory Structure | ||
|
||
* `stencils/`: Baseline (`istc.par.hpp`) and SARIS-accelerated (`istc.issr.hpp`) stencil codes. | ||
* `runtime/`: Additional runtime code and linking configuration needed for compilation. | ||
* `util/`: Evaluation program generator supporting different grid sizes and kernel calls. | ||
* `eval.json`: Configuration for test program generator. | ||
|
||
## Compile Evaluation Programs | ||
|
||
Before you can compile test problems, you need the [SARIS LLVM 15 toolchain](https://github.com/pulp-platform/llvm-project/tree/15.0.0-saris-0.1.0) along with `newlib` and `compiler-rt`. The required build steps are outlined [here](https://github.com/pulp-platform/llvm-toolchain-cd/blob/main/README.md). | ||
|
||
Then, you can build the test programs specified in `eval.json` by running: | ||
|
||
``` | ||
make LLVM_BINROOT=<llvm_install_path>/bin all | ||
``` | ||
|
||
By default, `eval.json` specifies RV32G and SSSR-accelerated test programs for all included stencils as specified in our paper. Binaries are generated in `bin/` and disassembled program dumps in `dump/`. | ||
|
||
|
||
## Run Evaluation Programs | ||
|
||
Evaluation programs can only be run in RTL simulation of a Snitch cluster using the default, SSSR-enhanced configuration `cfg/default.json`. For example, when building a QuestaSim RTL simulation setup from `target/snitch_cluster`: | ||
|
||
``` | ||
make CFG_OVERRIDE=cfg/default.hjson bin/snitch_cluster.vsim | ||
``` | ||
|
||
Then, the built evaluation programs can be run on this simulation setup as usual, for example: | ||
|
||
``` | ||
bin/snitch_cluster.vsim ../../sw/saris/bin/istc.pb_jacobi_2d_ml_issr.elf | ||
``` | ||
|
||
Performance metrics can be analyzed using the annotating Snitch tracer (`make traces`). In the default evaluation programs, the section of interest is section 2. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.