Improve multithreaded performance #173

sosthene-nitrokey · 2024-01-10T15:23:25Z

codecov · 2024-01-10T15:25:42Z

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (457fc9a) 90.73% compared to head (d2e15f5) 90.81%.

Files	Patch %	Lines
pkcs11/src/backend/key.rs	85.00%	3 Missing ⚠️
pkcs11/src/backend/session.rs	70.00%	3 Missing ⚠️
pkcs11/src/api/mod.rs	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #173      +/-   ##
==========================================
+ Coverage   90.73%   90.81%   +0.07%     
==========================================
  Files          31       31              
  Lines        6436     6410      -26     
==========================================
- Hits         5840     5821      -19     
+ Misses        596      589       -7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sosthene-nitrokey · 2024-01-10T15:41:31Z

The connections are only recreated if the calls have enough latency for the connections to go back to the idle pool.

Ideally this should not be the case. There should probably be as many concurrent connections as there are opened sessions. I will look if this is a possibility.

sosthene-nitrokey · 2024-01-11T09:50:58Z

It looks like to actually observe this behaviour it is necessary to have a delay in the operations, so it does not really work on a localhost machine. I worked around that in testing by using a proxy introducing an arbitrary delay before the testing docker image.

Bumping the number of idle connections seems to indeed solve the issue of having many unnecessary connections re-opening.

sosthene-nitrokey · 2024-01-11T10:29:45Z

We could make the number of max connections configurable. Since the throughput is mostly sensitive to the network latency, an application can obtain better performance with a higher number of opened connections. I believe that could scale even after there are more opened connections than CPUs. Leaving to a high default could make sense. What is an acceptable number of open connections for the nethsm?

daringer · 2024-01-11T12:18:48Z

In #157 I was suggesting 100 and nobody complained, thus I would go for 100 as a first shot.

ansiwen · 2024-01-11T13:29:21Z

According to the PKCS11 Spec a single session is not allowed to be used concurrently by multiple threads IIRC. So basically each session could have one connection. Then it would grow dynamically. Not sure how feasible that is. In case of nginx I believe every fork would only open one session and not use threads, and therefore only needs one connection (because every fork has its own), right?

sosthene-nitrokey · 2024-01-11T14:13:16Z

We could create the agent in the session instead of the instance. That would however degrade performance if the application creates and closes sessions regurlarly. I don't know if it ever makes sense to have more than 100 connections to the nethsm. There is sadly no way to dynamically change the pooling configuration once the agent is built in ureq
Regarding Nginx, are you sure uses forks and not threads for most operations? If it uses forks it shouldn't be affected by the original issue anyway.

ansiwen · 2024-01-11T14:35:22Z

Yes, pretty sure, that's why we had the issues with tokio. Nginx creates worker-forks, not threads. @nponsard can probably confirm that.

ansiwen · 2024-01-11T14:40:11Z

Reloads the configuration and forks a new set of worker processes.

https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/#update-upgrade

ansiwen · 2024-01-12T12:05:39Z

@sosthene-nitrokey ok, can you summarize the changes in this PR shortly? I lost track.

ansiwen · 2024-01-12T12:13:53Z

My current understanding is, that for optimal latency we would need one connection per thread. But in no case it should be necessary to close connections, because also several threads could sequentially use one connection, so that is the first problem that should be solved. because HTTP1 doesn’t support concurrent requests, that would not be optimal latency wise, so we would need a maximum of parallel connections as there are threads. Since ureq doesn’t support increasing the pool dynamically, we need to set a pool size as a configuration setting.

Does that make sense?

sosthene-nitrokey · 2024-01-12T13:07:29Z

The changes are maybe not as atomic as they should be. So this PR has multiple improvements:

Some simplifications of the statics
Increase the size of the HTTP connection pool. This is the most significant change.
Improve the multithreading of the reloading of the DB cache in fetch_all_keys.

sosthene-nitrokey · 2024-01-12T13:11:37Z

My current understanding is, that for optimal latency we would need one connection per thread. But in no case it should be necessary to close connections, because also several threads could sequentially use one connection, so that is the first problem that should be solved. because HTTP1 doesn’t support concurrent requests, that would not be optimal latency wise, so we would need a maximum of parallel connections as there are threads. Since ureq doesn’t support increasing the pool dynamically, we need to set a pool size as a configuration setting.

Does that make sense?

Yes we could have the pool size be a configuration setting. We could also make the max number of idle connections infinite.
Since it's synchronous, this would lead to having as many connections as the maximum number of threads to ever do a request at the same time. There could be some downsides to that, as #156 shows, if there is a large number of idle connections that are not closed, they could very well end up being "stale", and if there is a majority of stale connections in the pool, the performance could degrade as each stale connections could be tried, taking time to close.

sosthene-nitrokey · 2024-01-12T13:44:46Z

I made it configurable with at default to the number of CPUs.

daringer · 2024-01-12T13:49:13Z

okok, I will merge this - as this will likely solve the initial issue, please feel free to continue the discussion in an issue.

ansiwen · 2024-01-13T16:01:16Z

@sosthene-nitrokey so, do you have an explanation what mechanism is closing the connection in the first place as reported in the original issue?

sosthene-nitrokey · 2024-01-15T08:01:52Z

The ureq pooling is. When all connections in the pool are in used by multiple threads, a new connection is opened, and is closed after, so that the max pooled connection parameter is accepted.

Remove uses of Arc in statics

739723e

sosthene-nitrokey added 3 commits January 10, 2024 16:58

Increase the number of idle connections in the HTTP clients pools

c0b8948

Remove unneded path specefication

d3dd1e0

Remove discarded enumerate call

1ea56ac

sosthene-nitrokey marked this pull request as ready for review January 11, 2024 09:55

sosthene-nitrokey force-pushed the threads-perf branch from ae94791 to f69de7a Compare January 11, 2024 10:00

sosthene-nitrokey requested review from daringer and robin-nitrokey January 11, 2024 10:21

sosthene-nitrokey added 2 commits January 11, 2024 11:26

Use rayon to simplify key fetching

f20fa8b

Fix compilation

f19f08c

sosthene-nitrokey force-pushed the threads-perf branch from f69de7a to f19f08c Compare January 11, 2024 10:26

robin-nitrokey approved these changes Jan 11, 2024

View reviewed changes

daringer approved these changes Jan 12, 2024

View reviewed changes

Make the number of idle configuration configurable

d2e15f5

daringer merged commit 50c8223 into main Jan 12, 2024
4 of 5 checks passed

daringer deleted the threads-perf branch January 12, 2024 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve multithreaded performance #173

Improve multithreaded performance #173

sosthene-nitrokey commented Jan 10, 2024

codecov bot commented Jan 10, 2024 •

edited

Loading

sosthene-nitrokey commented Jan 10, 2024

sosthene-nitrokey commented Jan 11, 2024

sosthene-nitrokey commented Jan 11, 2024

daringer commented Jan 11, 2024

ansiwen commented Jan 11, 2024

sosthene-nitrokey commented Jan 11, 2024

ansiwen commented Jan 11, 2024

ansiwen commented Jan 11, 2024

ansiwen commented Jan 12, 2024

ansiwen commented Jan 12, 2024

sosthene-nitrokey commented Jan 12, 2024

sosthene-nitrokey commented Jan 12, 2024

sosthene-nitrokey commented Jan 12, 2024

daringer commented Jan 12, 2024

ansiwen commented Jan 13, 2024

sosthene-nitrokey commented Jan 15, 2024

Improve multithreaded performance #173

Improve multithreaded performance #173

Conversation

sosthene-nitrokey commented Jan 10, 2024

codecov bot commented Jan 10, 2024 • edited Loading

Codecov Report

sosthene-nitrokey commented Jan 10, 2024

sosthene-nitrokey commented Jan 11, 2024

sosthene-nitrokey commented Jan 11, 2024

daringer commented Jan 11, 2024

ansiwen commented Jan 11, 2024

sosthene-nitrokey commented Jan 11, 2024

ansiwen commented Jan 11, 2024

ansiwen commented Jan 11, 2024

ansiwen commented Jan 12, 2024

ansiwen commented Jan 12, 2024

sosthene-nitrokey commented Jan 12, 2024

sosthene-nitrokey commented Jan 12, 2024

sosthene-nitrokey commented Jan 12, 2024

daringer commented Jan 12, 2024

ansiwen commented Jan 13, 2024

sosthene-nitrokey commented Jan 15, 2024

codecov bot commented Jan 10, 2024 •

edited

Loading