-
Notifications
You must be signed in to change notification settings - Fork 4
refactor: Interpret all rpaths as resource names in transactions #318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
An example that would be a silent failure case with this change: fs = LakeFSFileSystem()
with fs.transaction("my-repo", "my-branch"):
fs.put_file("hello.txt", "my-other-repo/my-other-branch/hello.txt") # <- creates the file my-repo/my-branch/my-other-repo/my-other-branch/hello.txt. |
Proof of concept for a single repo-and-branch-scoped transaction. This ties a transaction to a single repo and branch by taking all file and directory names by resource only instead of a full URI. Naturally, this has the subtle side effect that given full URIs are silently understood as nested paths, and uploaded to the transaction branch without loud errors or warnings. A section was added to the transaction docs that details this behavior, but it might be safer to check the input path against existing repos and branches.
cb1b367
to
a01fdbf
Compare
This way it can return paths as they are when we are not in a transaction, or when we are in a transaction, but repo and ref have already been prepended.
Some leftover fully qualified paths in transactions, which came to light when other filesystem APIs were migrated to the new URI maker scheme.
This is now ready for review. I'm not extremely happy with how it turned out, but it does make the transaction a lot easier to use by not having to prepend repo and branch name all the time. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #318 +/- ##
==========================================
+ Coverage 94.91% 95.02% +0.10%
==========================================
Files 5 5
Lines 413 422 +9
Branches 92 94 +2
==========================================
+ Hits 392 401 +9
Misses 15 15
Partials 6 6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Built on #317.
Proof of concept for a single repo-and-branch-scoped transaction.
This ties a transaction to a single repo and branch by taking all file and
directory names by resource only instead of a full URI.
Naturally, this has the subtle side effect that given full URIs are silently
understood as nested paths, and uploaded to the transaction branch without
loud errors or warnings. A section was added to the transaction docs that
details this behavior, but it might be safer to check the input path against
existing repos and branches.
cc @AdrianoKF, after the UPathManager excursion we had lately. Opinions welcome - I'm leaning towards at least validating the input path against existing repos to make sure that the user doesn't accidentally do the wrong thing.
But in general, in my opinion, scoping transactions to a single repo and branch pair only makes sense, insofar as uploading resources to other repos and branches results in silent uncommitted changes, which is also not ideal.
Outstanding test failures are because I haven't ported all methods to the
make_uri(rpath)
pattern yet - I'm not sure this is the most elegant way to do things, so if you have other ideas, please let me know.