-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
image-rs: use index for layers store path #902
Conversation
e4a03f0
to
383fccf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome bug report, thanks. Can you describe in more detail what you are doing in your fix in the PR description?
A possible issue with the approach taken is that pull client cannot run in parallel since the layer index is managed in memory. Although, this already seems to be an issue since the meta store is only saved at the end of pull_image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. Btw I did not find any place that shows mount
does not support data longer than 4096 bytes although the behavior shows this.
Another thing I worry about is synchronization issues. Now ImageClient
can respond to multiple pull_image
requests asynchronously. Each pull_image
request will create a PullClient
. Assuming multiple pull_image requests are initiated at the same time, it will be possible to obtain multiple identical or close layers_index
, and overwriting might occur.
The previous method of using hash values to name layers can avoid this risk assuming that the hashes do not collide. This is possible if an auto-incrementing number is used.
If number is used, do we need to consider using PullClient
as part of ImageClient
and perform operations such as async_pull_layers
and pull_manifest
exclusively and atomically?
5b30e95
to
2f85db2
Compare
Would moving the counter to |
0259c8c
to
06eac92
Compare
I'm thinking would it be possible to use the footprint instead of the full hash value as the layer store name? For example, changing from |
I have considered it before, but considering that the probability of collision will increase exponentially after truncating the hash value, it may cause unexpected bugs in extreme cases. If we use the self-increasing index as proposed, layer collisions can be absolutely avoid. |
is there a way to write this code in a way that we can unit-test the construction of the layer storage paths? this would be good also for specifying what we are doing. |
1e9be67
to
2e2de58
Compare
f693522
to
97eda8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @mkulke PTAL
cded759
to
4b1a11c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @squarti . While I'm concerned about the ci failure with ubuntu-24.04-arm
.
CI Green. @mkulke are we good to merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the moment we're just logging the errors instead of bailing out. since this is a sensitive area of the code, I would prefer if we would propagate the errors and fail on anything that we encounter and do not expect, i.e. get_layers_index()
should return a Result<>
since it's a fallible operation.
This PR creates a layer store abstraction which uses an incrementing index as a unique store path for image layers instead of layer digests. This reduces the layer path length significantly allowing more layers to be mounted given the 4096 page size limit of mount command. Fixes: confidential-containers#901 Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
4b1a11c
to
16ae685
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for accommodating, a few nits.
it's not great that we swallow the errors at the upper layers now, but apparently that is also what we do with the meta store. It's still an improvement because we don't get something undefined on malformed images, but a predictable and unusuable 0-ed out layerstore. if we unwrap we still need to log the errors though (see suggestion), otherwise it won't be fun to debug.
Co-authored-by: Magnus Kulke <mkulke@gmail.com> Signed-off-by: squarti <silenio_quarti@ca.ibm.com>
Co-authored-by: Magnus Kulke <mkulke@gmail.com> Signed-off-by: squarti <silenio_quarti@ca.ibm.com>
Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
20808d2
to
091d191
Compare
This PR associates an index for a given image layer and uses the index to build the layer store path. This reduces the path length significantly. The next available layer index is computed by reading the existent directories in the data directory. This is done during pull client initialization.
Fixes: #901