-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: document retries working #484
base: s3-integration-feature-branch
Are you sure you want to change the base?
feat: document retries working #484
Conversation
cd47238
to
611820c
Compare
@@ -247,6 +248,13 @@ def shutdown_handler(signum: int, _frame: Any): | |||
show_default=True, | |||
help="Exit immediately when an error occurs.", | |||
) | |||
@click.option( | |||
"--doc-retries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make this option agnostic of the loader? Like --loading-retries
or --load-retries
or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good!
@@ -52,3 +52,11 @@ def load(self, row: dict[str, str]) -> LoadedDocument: | |||
file_path=row[self.column_name], | |||
file_type=guess_filetype(content, row[self.column_name]), | |||
) | |||
|
|||
|
|||
class DocumentLoadingError(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about having a LoadingError
base exception? In that way we can always raise an exception no matters the loader type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll switch to that.
WHEN MATCHED | ||
AND target.retries >= %s THEN | ||
DELETE | ||
WHEN MATCHED THEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding an extra check so the query won't ever remove a row if there is a backoff in place?
WHEN MATCHED THEN | |
WHEN MATCHED | |
AND retry_after <= now() THEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wrote a test for this case, it's already working, as this is handled in the fetch_work_query func:
WITH selected_rows AS (
SELECT {pk_fields}
FROM {queue_table}
WHERE retry_after is null or retry_after < now()
LIMIT %s
FOR UPDATE SKIP LOCKED
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just pasted the initial part of the query not to make it not to pollute the conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe it is part of this query to protect itself against an unexpected behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can include that, but it's a code that will be never reached/tested, so not sure it's really worth it.
c96b0e8
to
b2f01b4
Compare
b6b8aeb
to
a0a492f
Compare
implementation of the document failure retries. not only retry but also include info about those in the errors table. make sure we don't retry unless retry_after is in the past.
a0a492f
to
f85436d
Compare
Implementation of the document failure retries, both for s3 problems, and LLM provider.