Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foundations for BPF device integration: DSL/IR/compiler part #107

Open
daphne-eu opened this issue Aug 24, 2021 · 2 comments
Open

Foundations for BPF device integration: DSL/IR/compiler part #107

daphne-eu opened this issue Aug 24, 2021 · 2 comments
Assignees

Comments

@daphne-eu
Copy link
Owner

In GitLab by @pdamme on Aug 24, 2021, 14:41

We want to support IO kernels for BPF devices (computational storage). To support asynchronous IO in this context, we need to carry information on an open file between kernel calls. Thus, open/close operations are required. This is a kind of sideways access, later the compiler should support it automatically. In particular, the workflow looks as follows:

For POSIX:

  • open(filename) -> File
  • readCsv(File) -> Matrix
  • close(File) -> void

For BPF

  • open(dev) -> Target
  • open(Target, filename) -> Descriptor
  • readCsv(Descriptor) -> Matrix
  • close(Target) -> void

The device in the open-call is essentially a string.
There will be different kinds of descriptors, based on the underlying hardware and techniques. In that sense, File could be considered a variant of a Descriptor, or at least they could have a common base class, such that an IO kernel (e.g. readCsv always takes the same kind of input).

Thanks @niclashedam for bringing up this topic and analyzing the requirements.


This issue is about the DaphneDSL/DaphneIR/compiler integration. We need additional built-in functions, IR operations/types, and these must be supported by the Daphne compiler.

This issue is closely related to # on the run-time part (#108).

@daphne-eu
Copy link
Owner Author

In GitLab by @pdamme on Aug 24, 2021, 15:45

mentioned in commit 2337047

@daphne-eu
Copy link
Owner Author

In GitLab by @bonnet-p on Aug 27, 2021, 10:33

Dear Patrick, all,
A few comments on this issue:
(1) There is a silent assumption of an underlying file system in the first paragraph and throughout the text. This is neither general nor necessary. We should be careful not to assimilate I/Os and files. These are two different levels of abstractions. The first question is what kind of abstraction is needed by the run-time. Do we need a universal storage abstraction in between tables/matrices(dense and sparse) and NVMe? My initial take would be no.
(2) I think we can assume that NVMe will provide command sets for offloading BPF programs (CS command set) and reading/writing data (IO command set) on namespaces of different kinds, e.g., LBA, ZNS or KV (all equipped with administration command sets for identification and async operations). The question for us is whether this should be exposed to the runtime or whether some complexity should be abstracted. What we do not want to do is to introduce concepts or mix abstraction levels. In particular, files and BPF programs have nothing to do with each other.
(3) An additional issue is how to deal with storage tiering. Should various storage tiers be exposed to the run-time or should data placement/movement be managed separately from the run-time? This is a fundamental design decision.
(4) I think we can find (a) generic data structures to (i) name devices/namespaces and (ii) carry namespace context across asynchronous calls and a (b) generic open/async call/close interface across file systems, raw devices equipped or not with computational storage processor. But do we need separate kernels for the three operations in (b)? This is something I do not understand.
Best,
Philippe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants