Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hooks condition for "repo is locked" and "unlocking repo" #669

Open
thiswillbeyourgithub opened this issue Feb 15, 2025 · 6 comments
Open
Labels
enhancement New feature or request

Comments

@thiswillbeyourgithub
Copy link

Is your feature request related to a problem? Please describe.
I noticed the same issue as in #244. I am hesitant to enable auto unlock and think it would be nice to have a notification hook for when auto unlock is used.

Describe the solution you'd like
Notification hook for when auto unlock is used.

Great work on backrest. It's pretty nice. So far my only issue is #244 but unfortunately it is quite serious.

@thiswillbeyourgithub thiswillbeyourgithub added the enhancement New feature or request label Feb 15, 2025
@garethgeorge
Copy link
Owner

Backrest unfortunately doesn't actually know if you repo was locked or not (even when it runs the unlock command) and would need to run extra commands to check.

@thiswillbeyourgithub
Copy link
Author

Relevant point regarding checking wether the repo is locked but I don't see why that affects "knowing when autounlock was used".

Or maybe autounlock just means passing an argument to restic that autounlocks? In that case traying without and on error with autounlock would be a workaround no?

@garethgeorge
Copy link
Owner

So autounlocking today is a bit of a fire-and-forget operation. Backrest is simply running restic unlock before any operation, the problem is that restic unlock doesn't support json output and backrest isn't attempting to parse any of the human readable text output.

So backrest can't actually know what was done. That said, this is typically a pretty safe operation as restic will only remove locks older than 30 minutes (last I checked) and any live operations are also expected to continually refresh the timestamps on their lock files. In modern versions of restic it's very unlikely that unlock will remove a lock file that should have been kept.

At a higher level I'm curious to hear a bit more about the problem you're trying to solve. You have multiple clients backing up to a single repo and are trying to better understand how often there are collisions?

@thiswillbeyourgithub
Copy link
Author

thiswillbeyourgithub commented Feb 26, 2025

Note: for some reason I had missed the auto unlock tooltip and didn't know the auto unlock only occured on prune and check operations.

Backrest is simply running restic unlock before any operation, the problem is that restic unlock doesn't support json output and backrest isn't attempting to parse any of the human readable text output.

Thanks for the clarification!

A workaround then could be:

  • try the command without unlocking first
  • check if we get the "repo is locked" string in the answer, if we do :
    • if we have autounlock checked then do unlock + retry the command
    • if we don't raise the exception as usual.

I have no idea how big of a change that requires, and absolutely no idea if that is really increasing safety or purely for my emotions.

That said, this is typically a pretty safe operation as restic will only remove locks older than 30 minutes (last I checked) and any live operations are also expected to continually refresh the timestamps on their lock files. In modern versions of restic it's very unlikely that unlock will remove a lock file that should have been kept.

Thank you again for clarifying! Very appreciated. I think the source was that forum thread.

At a higher level I'm curious to hear a bit more about the problem you're trying to solve. You have multiple clients backing up to a single repo and are trying to better understand how often there are collisions?

Well i've been using restic for a few years now and it's been great. Stumbling upon backrest was awesome because:

  1. the automation is great and allows me to use more frequent backups
  2. the gui is good enough for me to install it on my non technical relatives
  3. the notification system is great because I can use that to know for my relatives if they're safe
  4. The automation for check and prune is great for low maintenance and motivated me for my relatives in the first place.

Now, because they are non technical i don't want them to have to click on unlock if the repo ends up locked. Which happens quite a lot for frequent backups on a computer that is regularly shut down. Timezones can also have an importance as highlighted on the forum.

So, at my level (i'm not a backup engineer) it seems reasonnable to fully trust restic to be mature enough and bug free enough to mostly rely on it. But backrest seems to be a 1 person team and as far as I understand the idea to auto unlock the repo was yours and was not a design choice of the restic team (afaik there is no --always-unlock but the --no-lock seems crazy dangerous) so I actually don't know how much more fragile my backups end up.

I sincerely hope you don't take this the wrong way and want you to know I am extremely appreciative of your work on backrest, it's really nice!

Because after understanding how useful backrest was I took a look at the repo and thought a bit about this before being sold although by then I was already a happy user and installed it around me :).

You said:

That said, this is typically a pretty safe operation as restic will only remove locks older than 30 minutes

The word "typically" here is a bit bothering. Because for example i had many stability issues recently and encountered tons of boot problems on my linux server. That's exactly when I want my backup system to be reliable and I don't know if interrupting 10 times a backup is a real problem or not but I know it's not "typical". Reliability for backups has to be preserved even on degraded conditions as much as possible afaic.

What do you think of, as an alternative, just an option to retry every 1 minute for up to X minutes if the repo is locked before erroring out if still locked. That seems somewhat safer no? Also the occasion to send a notification "action was delayed by X minutes".

Even better it could take the form of "autounlock if lock is older than X" with X to 0 to disable the feature. This way we get the safety of locks and only remove old locks while remaining prudent about recent locks. Of course, if the time of the lock changes at all during the wait we should error out.

Another thing I just learned: restic list locks can be used to know in advance if a repo is locked. --json does not seem to work. But checking inside the repo's lock folder does not seems too complicated either? That would make it possible to geg the age of the lock.

So yeah bottom line is that locks are here for a reason and I don't know how much trust I should put in you vs on the restic team that made design choices against that decision. Enabling auto unlock seems like using --no-lock and is less safe. Disabling auto unlock means that if the user reboots at the wrong time we have no backups until the repo is manually unlocked. Unless I'm mistaken.

Also something I'm not clear about: does backrest do concurrent operations? Say it's supposed to run a backup every 1h but a prune is due this afternoon and takes more than 1h. Will it know to skip/delay the backups while the prune is happening ? Will it try as scheduled, fail because of the lock, and send me a notification? Will the autounlock means we can have both at the same time?

Let me reiterate that this is the only issue I have with backrest, it seems great on all front but I'm just very wary of this one feature. I really hope you will not in any way feel anything but constructive ctricism :).

PS: i think a tooltip on this check box to explain more might be a good idea!

PS2: shouldn't the auto unlock checkbox be moved up to be just after the prune policy section? After all it's only about forget and prune operations.

@garethgeorge
Copy link
Owner

garethgeorge commented Feb 26, 2025

So to elaborate a bit more -- typically is perhaps not a strong enough word. It is extremely unlikely that you will see corruption when using auto unlock. The only scenario I'm aware of that could actually damage the repo is:

  • two restic processes are attempting to run prune
  • the already running process has stopped refreshing its lock file e.g. become suspended OR there's a bug in the storage implementation preventing the lock timestamps from being updated
  • auto unlock removes its lock

Because restic refreshes the timestamps on lock files in normal operation it is vanishingly unlikely that this will happen with normal storage providers, but could be an issue with a bad storage implementation i.e. some of the rclone providers have bugs with how timestamps are tracked iiuc.

For most users, auto unlock is a safe and good choice if not sharing a repo across multiple machines since it should never be the case that there is a concurrent operation there (backrest serializes all of the commands it runs). If sharing a repo, my strong recommendation is to only configure a prune policy on one machine if sharing a repo across multiple hosts.

I chose the autounlock behavior based on this comment from fd0 https://forum.restic.net/t/risks-consequences-of-prune-when-backup-in-progress-or-vice-versa/751/5

@thiswillbeyourgithub
Copy link
Author

Thank you very much for taking the time.

But so IIUC you assume that if the user uses backrest it will not use restic itself right? Because I can imagine using my own shell to call restic to investigate some situations or do a proper recovery and then if I have not remembered to disable backrest I could run into issues right?

What do you think about the sleep suggestion?

What do you think of, as an alternative, just an option to retry every 1 minute for up to X minutes if the repo is locked before erroring out if still locked. That seems somewhat safer no? Also the occasion to send a notification "action was delayed by X minutes".
Even better it could take the form of "autounlock if lock is older than X" with X to 0 to disable the feature. This way we get the safety of locks and only remove old locks while remaining prudent about recent locks. Of course, if the time of the lock changes at all during the wait we should error out.

It still seems safer to me while not being that complicated. Between timezones, rclone backends (no matter how well implemented, the remote server can still do changes on its own and rclone will have issues until addressed), the users wanting to use restic all by themselves for once, I'm a bit worried there are situations we have not thought about that could snowball into bad backup days. Hence my "better be safe" approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants