Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve multithreaded performance #173

Merged
merged 7 commits into from
Jan 12, 2024
Merged

Improve multithreaded performance #173

merged 7 commits into from
Jan 12, 2024

Conversation

sosthene-nitrokey
Copy link
Contributor

Fix #157

Copy link

codecov bot commented Jan 10, 2024

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (457fc9a) 90.73% compared to head (d2e15f5) 90.81%.

Files Patch % Lines
pkcs11/src/backend/key.rs 85.00% 3 Missing ⚠️
pkcs11/src/backend/session.rs 70.00% 3 Missing ⚠️
pkcs11/src/api/mod.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #173      +/-   ##
==========================================
+ Coverage   90.73%   90.81%   +0.07%     
==========================================
  Files          31       31              
  Lines        6436     6410      -26     
==========================================
- Hits         5840     5821      -19     
+ Misses        596      589       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sosthene-nitrokey
Copy link
Contributor Author

The connections are only recreated if the calls have enough latency for the connections to go back to the idle pool.

Ideally this should not be the case. There should probably be as many concurrent connections as there are opened sessions. I will look if this is a possibility.

@sosthene-nitrokey
Copy link
Contributor Author

It looks like to actually observe this behaviour it is necessary to have a delay in the operations, so it does not really work on a localhost machine. I worked around that in testing by using a proxy introducing an arbitrary delay before the testing docker image.

Bumping the number of idle connections seems to indeed solve the issue of having many unnecessary connections re-opening.

@sosthene-nitrokey
Copy link
Contributor Author

We could make the number of max connections configurable. Since the throughput is mostly sensitive to the network latency, an application can obtain better performance with a higher number of opened connections. I believe that could scale even after there are more opened connections than CPUs. Leaving to a high default could make sense. What is an acceptable number of open connections for the nethsm?

@daringer
Copy link

In #157 I was suggesting 100 and nobody complained, thus I would go for 100 as a first shot.

@ansiwen
Copy link
Collaborator

ansiwen commented Jan 11, 2024

According to the PKCS11 Spec a single session is not allowed to be used concurrently by multiple threads IIRC. So basically each session could have one connection. Then it would grow dynamically. Not sure how feasible that is. In case of nginx I believe every fork would only open one session and not use threads, and therefore only needs one connection (because every fork has its own), right?

@sosthene-nitrokey
Copy link
Contributor Author

We could create the agent in the session instead of the instance. That would however degrade performance if the application creates and closes sessions regurlarly. I don't know if it ever makes sense to have more than 100 connections to the nethsm. There is sadly no way to dynamically change the pooling configuration once the agent is built in ureq
Regarding Nginx, are you sure uses forks and not threads for most operations? If it uses forks it shouldn't be affected by the original issue anyway.

@ansiwen
Copy link
Collaborator

ansiwen commented Jan 11, 2024

Yes, pretty sure, that's why we had the issues with tokio. Nginx creates worker-forks, not threads. @nponsard can probably confirm that.

@ansiwen
Copy link
Collaborator

ansiwen commented Jan 11, 2024

Reloads the configuration and forks a new set of worker processes.

https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/#update-upgrade

@ansiwen
Copy link
Collaborator

ansiwen commented Jan 12, 2024

@sosthene-nitrokey ok, can you summarize the changes in this PR shortly? I lost track.

@ansiwen
Copy link
Collaborator

ansiwen commented Jan 12, 2024

My current understanding is, that for optimal latency we would need one connection per thread. But in no case it should be necessary to close connections, because also several threads could sequentially use one connection, so that is the first problem that should be solved. because HTTP1 doesn’t support concurrent requests, that would not be optimal latency wise, so we would need a maximum of parallel connections as there are threads. Since ureq doesn’t support increasing the pool dynamically, we need to set a pool size as a configuration setting.

Does that make sense?

@sosthene-nitrokey
Copy link
Contributor Author

The changes are maybe not as atomic as they should be. So this PR has multiple improvements:

  • Some simplifications of the statics
  • Increase the size of the HTTP connection pool. This is the most significant change.
  • Improve the multithreading of the reloading of the DB cache in fetch_all_keys.

@sosthene-nitrokey
Copy link
Contributor Author

My current understanding is, that for optimal latency we would need one connection per thread. But in no case it should be necessary to close connections, because also several threads could sequentially use one connection, so that is the first problem that should be solved. because HTTP1 doesn’t support concurrent requests, that would not be optimal latency wise, so we would need a maximum of parallel connections as there are threads. Since ureq doesn’t support increasing the pool dynamically, we need to set a pool size as a configuration setting.

Does that make sense?

Yes we could have the pool size be a configuration setting. We could also make the max number of idle connections infinite.
Since it's synchronous, this would lead to having as many connections as the maximum number of threads to ever do a request at the same time. There could be some downsides to that, as #156 shows, if there is a large number of idle connections that are not closed, they could very well end up being "stale", and if there is a majority of stale connections in the pool, the performance could degrade as each stale connections could be tried, taking time to close.

@sosthene-nitrokey
Copy link
Contributor Author

I made it configurable with at default to the number of CPUs.

@daringer
Copy link

okok, I will merge this - as this will likely solve the initial issue, please feel free to continue the discussion in an issue.

@daringer daringer merged commit 50c8223 into main Jan 12, 2024
4 of 5 checks passed
@daringer daringer deleted the threads-perf branch January 12, 2024 13:51
@ansiwen
Copy link
Collaborator

ansiwen commented Jan 13, 2024

@sosthene-nitrokey so, do you have an explanation what mechanism is closing the connection in the first place as reported in the original issue?

@sosthene-nitrokey
Copy link
Contributor Author

The ureq pooling is. When all connections in the pool are in used by multiple threads, a new connection is opened, and is closed after, so that the max pooled connection parameter is accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

optimizations needed in multithreaded environment (far above 1 open/close per second otherwise)
4 participants