-
Notifications
You must be signed in to change notification settings - Fork 2
Implement mirror_file, mirror_part & finalize_file (#6862) #7043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement mirror_file, mirror_part & finalize_file (#6862) #7043
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7043 +/- ##
===========================================
- Coverage 85.47% 85.27% -0.20%
===========================================
Files 150 151 +1
Lines 21678 21846 +168
===========================================
+ Hits 18529 18629 +100
- Misses 3149 3217 +68 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
aa9f3d5
to
0171653
Compare
7bec7cb
to
9f01c2a
Compare
bf67c7d
to
5847dcd
Compare
ff6f582
to
788dda8
Compare
6191b82
to
f09c3cb
Compare
91f35b8
to
78469d7
Compare
9dc1566
to
2c91700
Compare
2c91700
to
a2f9717
Compare
Notes on test coverage: since most files are > 10 MB, it's common for no files at all to be mirrored during the IT, so we effectively have no IT coverage for the |
Note on scale/cost/efficiency: on my personal deployment, running |
33d63d0
to
50f39fb
Compare
e4b935c
to
37c4900
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subject: [PATCH] REVIEW 1
---
Index: src/azul/indexer/mirror_service.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/indexer/mirror_service.py b/src/azul/indexer/mirror_service.py
--- a/src/azul/indexer/mirror_service.py (revision 0e764360a78161c7bb8d54e7e8cde365f2052d34)
+++ b/src/azul/indexer/mirror_service.py (date 1746498183684)
@@ -52,8 +52,8 @@
"""
A part of a mirrored file
"""
- #: The part number, starting at 0 for the first part. Note that the S3 API
- #: numbers parts starting at 1.
+ #: The part number, starting at 0 for the first part, unlike S3 API part
+ #: numbers, which start at 1.
#:
index: int
@@ -64,7 +64,7 @@
#:
size: int
- #: Various quotas related to parts and part sizes
+ #: Various S3 quotas related to parts and part sizes
#: https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html
#:
min_size: ClassVar[int] = 5 * 1024 ** 2
@@ -102,14 +102,13 @@
last part.
"""
assert file.size is not None, R('File size unknown', file)
- stop = self.offset + self.size
- if stop == file.size:
+ next_offset = self.offset + self.size
+ if next_offset == file.size:
return None
- elif 0 < stop < file.size:
- return attr.evolve(self,
- index=self.index + 1,
- offset=stop,
- size=min(self.size, file.size - stop))
+ elif 0 < next_offset < file.size:
+ next_index = self.index + 1
+ next_size = min(self.size, file.size - next_offset)
+ return attr.evolve(self, index=next_index, offset=next_offset, size=next_size)
else:
assert False, R('Part range exceeds file size', self, file)
Index: src/azul/azulclient.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/azulclient.py b/src/azul/azulclient.py
--- a/src/azul/azulclient.py (revision 0e764360a78161c7bb8d54e7e8cde365f2052d34)
+++ b/src/azul/azulclient.py (date 1746509333358)
@@ -759,7 +759,7 @@
etags: Sequence[str]
):
file = self.load_file(catalog, file_json)
- assert etags
+ assert len(etags) > 0
self.mirror_service.finish_mirroring_file(file, upload_id, etags)
log.info('Successfully mirrored file %r via multi-part upload', file.uuid)
self.client.mirror_file_part(message['catalog'], | ||
message['file'], | ||
message['part'], | ||
message['upload_id'], | ||
message['etags']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting unwieldy. It is essentially the deserialization counterpart to mirror_part_message
in AzulClient. So the two things that should be really close to each other, since they are so tightly coupled, are actually far apart. So that's brittle. Then again, we should stop writing this type of code and use the serialization framework in azul.attrs
instead. The bodies of the SQS messages really want to be represented by a class.
The use of positional arguments is also brittle. There are even two neighboring arguments of the same type so there is real danger of accidentally swapping them. On top of that, I am really surprised that this even gets past the type checker. Each value in the message
dict is of type AnyJSON but mypy
silently narrows them to the more specific types of the arguments declared in the mirror_file_part
signature. Something must be wrong with mypy
or how mypy
analyzes the code in azulclient
.
This is all somewhat worrying and we need to improve this as quickly as possible. I will take a stab at it as soon as we have a functional version.
0071471
to
fe9a0b3
Compare
Security design review
|
6cdc15d
to
cd69dc8
Compare
Connected issues: #6862
Checklist
Author
develop
issues/<GitHub handle of author>/<issue#>-<slug>
1 when the issue title describes a problem, the corresponding PR
title is
Fix:
followed by the issue titleAuthor (partiality)
p
tag to titles of partial commitspartial
or completely resolves all connected issuespartial
labelAuthor (chains)
base
or this PR is not chained to another PRchained
or is not chained to another PRAuthor (reindex, API changes)
r
tag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:dev
or the changes introduced by it will not require reindexing ofdev
reindex:anvildev
or the changes introduced by it will not require reindexing ofanvildev
reindex:anvilprod
or the changes introduced by it will not require reindexing ofanvilprod
reindex:prod
or the changes introduced by it will not require reindexing ofprod
reindex:partial
and its description documents the specific reindexing procedure fordev
,anvildev
,anvilprod
andprod
or requires a full reindex or carries none of the labelsreindex:dev
,reindex:anvildev
,reindex:anvilprod
andreindex:prod
API
or this PR does not modify a REST APIa
(A
) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.py
or this PR does not modify a REST APIAuthor (upgrading deployments)
make docker_images.json
and committed the resulting changes or this PR does not modifyazul_docker_images
, or any other variables referenced in the definition of that variableu
tag to commit title or this PR does not require upgrading deploymentsupgrade
or does not require upgrading deploymentsdeploy:shared
or does not modifydocker_images.json
, and does not require deploying theshared
component for any other reasondeploy:gitlab
or does not require deploying thegitlab
componentdeploy:runner
or does not require deploying therunner
imageAuthor (hotfixes)
F
tag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprod
andprod
) have temporary hotfixes for any of the issues connected to this PRAuthor (before every review)
develop
, squashed old fixupsmake requirements_update
or this PR does not modifyrequirements*.txt
,common.mk
,Makefile
andDockerfile
R
tag to commit title or this PR does not modifyrequirements*.txt
reqs
or does not modifyrequirements*.txt
make integration_test
passes in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
System administrator (after approval)
demo
orno demo
no demo
no sandbox
N reviews
label is accurateOperator (before pushing merge the commit)
reindex:…
labels andr
commit title tagno demo
develop
_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
deploy:gitlab
deploy:gitlab
System administrator
dev.gitlab
are complete or this PR is not labeleddeploy:gitlab
anvildev.gitlab
are complete or this PR is not labeleddeploy:gitlab
Operator (before pushing merge the commit)
_select dev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
_select anvildev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
sandbox
label or PR is labeledno sandbox
dev
or PR is labeledno sandbox
anvildev
or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
or this PR does not remove catalogs or otherwise causes unreferenced indices indev
anvilbox
or this PR does not remove catalogs or otherwise causes unreferenced indices inanvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
p
if the PR is also labeledpartial
Operator (chain shortening)
develop
or this PR is not labeledbase
chained
label from the blocked PR or this PR is not labeledbase
base
base
label from this PR or this PR is not labeledbase
Operator (after pushing the merge commit)
dev
anvildev
dev
dev
anvildev
anvildev
_select dev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
_select anvildev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
dev
anvildev
Operator (reindex)
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
dev
or this PR does not require reindexingdev
deploy_browser
job in the GitLab pipeline for this PR indev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
deploy_browser
job in the GitLab pipeline for this PR inanvildev
or this PR does not require reindexinganvildev
Operator
deploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels to the next promotion PRs or this PR carries none of these labelsdeploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
L
line is too longW
line wrapping is wrongQ
bad quotesF
other formatting problem