-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lace 2.0 #10
base: master
Are you sure you want to change the base?
Conversation
Interesting. I actually thought about this the first time I saw lace. But I quickly realized it is by far not as bad with macros as it looked like on the first sight. One way or another I would certainly welcome running the benchmarks with So instead I will play the devil's advocate and propose to stay with macros 😉. The sole reasoning to slightly decrease the use of macros feels kind of niche to me if there is no real issues with them in practice. I for myself would be more interested e.g. in power saving by spinning up&down threads in runtime as discussed in #9 😉. To support the macros side of view feel free to take a look at https://github.com/Hirrolot/interface99 and https://github.com/Hirrolot/datatype99 for modern macro library for C. |
Regarding #9, if no tasks are running, then you can already use
The hard part is if we would want to remove workers on-the-fly while they are doing work, which means this work will need to be executed by another worker and cleanly disposing of current work will be a challenge; I don't know if there is an efficient way to do this. Also, I'm not sure if Lace is the right framework to do this. Lace is based on busy waiting workers, that's not suitable for mobile platforms. The intended application is large computations performed on multi-core machines. Regarding the linked repositories, I like those READMEs and maybe I should improve the README.md on my own projects a bit. Regarding the Lace 2.0 API, so far I find it cleaner. The |
Yes, the macros feel painful at first, but Lace is a pretty low-level library anyways and it will be mostly be used by people that want to optimize for performance. I don't know whether changing the API helps that much: Maybe just adding this to the documentation would help for people to understand what the "c-style" semantics are of this macro. The points made in #9 reflect quite well with the discussion we are having in the linked storm thread. |
@trolando Happy birthday, btw! :-) |
The following should be taken with quite some reservations, considering I've only been reading Sylvan's source code, not actually writing anything using Lace. I would not say that the new API is drastically better or is drastically better to read - slightly, but not a lot. The As Sebastian Junges also said: maybe one cannot make Lace much better at the level of abstraction it is at. A lot can probably be done merely by providing more documentation: for example expand the fibonacci example into a larger Markdown example, with some drawings, and introducing one concept ( In many cases, the pattern is always "spawn n-1 recursive calls and run the n'th on the same thread". I wonder whether it is possible to provide a higher-level abstraction of this pattern, i.e. something of the form. TASK_1(int, fib, int, n)
{
int a, b = fib(n-1, n-2);
return a+b;
} Finally, you may have a better idea of whether this is a good idea after looking at how this affects Sylvan, i.e. both its code clarity and its performance? |
Improving the documentation is a good idea. I've updated the README.md on the master branch. The original reason for working on a new API is that the macros can be confusing for novice developers working on Sylvan or LTSmin for their student projects. |
The par construct that was just added to the Flix language might also be relevant for an intuitive design https://github.com/flix/flix/blob/master/main/src/library/RedBlackTree.flix#L423 |
2fba2e9
to
9ffb6f0
Compare
769dda4
to
cdc7f3e
Compare
64149a7
to
c01df0e
Compare
Lace 2.0 is a redesign of the API, work in progress. The problem to be solved is that in its current (pre-2.0) state, Lace is quite intrusive for a developer as it uses macros to replace normal function definitions. While some amount of intrusion is inevitable, one might say it is excessive in the current form.
Example:
In a package such as Sylvan, this means that large parts of the source code consists of these kind of macros. This is clearly difficult to read at first.
The reason for these macros is that this allows automatic injection of the pointers to the lace worker struct and the task queue head as parameters of Lace tasks. This shaves off a tiny bit of work because in particular the queue head does not need to be stored in memory, except possibly in the program stack. This efficiency is obvious in very fine grained parallelism with tiny tasks, such as fibonacci and n-queens.
An easier API would be this:
In this case, the lace worker struct is a thread-local variable, which needs to be obtained each time
SPAWN
andSYNC
are used, and the task queue head is stored inside the worker struct. So this requires a write operation to this worker struct each timeSPAWN
andSYNC
are used, which was avoided before.We still need some way of defining the helper functions for work stealing, which is now a
TASK
macro that simply defines a number of relatively smallstatic inline
functions, such asfib_SPAWN
,fib_SYNC
and something likefib_RUN
that ensures that thefib
function is executed by a Lace worker. One can also just callfib
directly, which required aCALL
macro before.What does this mean for performance?
I ran a number of benchmarks on my desktop computer running Linux (Manjaro). See the following.
From these benchmarks, it is obvious that the very fine grained benchmarks fib and nqueens suffer a performance degradation of about 30%, and these benchmarks consist of billion of nearly empty tasks. The same holds to a lesser degree for
integrate
, which is a program that does some arithmetic work without accessing memory. The other benchmarks are not significantly affected by this change, in particular the challenging uts program (unbalanced tree search) has no significant difference in performance or speedup.Notice also that the performance difference is in the case of fib equal to the difference of using another compiler. The reason that the fibonacci benchmark is so fast with current Lace is due to a performance optimization that gcc does better than clang. Although a caveat here is that I compiled with
-O2
and not-O3
, maybe I should rerun all benchmarks again with-O3
.The question is now whether or not to go ahead with this new design, at the cost of performance of certain niche benchmark programs.