-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV uploads for ClickHouse Cloud #236
Merged
Merged
Changes from 1 commit
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
2b26552
CSV uploads
calherries 7ad64a0
~ move function
calherries 7fb8206
Set `wait_end_of_query=1` for clickhouse cloud when creating the table
calherries 6b8935b
Only support uploads for CH Cloud DBs
calherries d1c025d
Fix reflection warnings
calherries 2cec411
Use order by to specify primary key
calherries a51c32e
set select_sequential_consistency in connection details
calherries 847b80f
Avoid setting select_sequential_consistency for on-premise
calherries 5554a61
Remove unused imports
calherries eb3cdc7
~ 49.11
calherries 490a558
~ fix test and add test for `select_sequential_consistency`
calherries File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, select_sequential_consistency just limits which replica you query from, and doesn't affect the performance of the query on that specific replica.
This logic seems like it will break consistency for multi-replica on prem installations, and needlessly disable it on single replica ones. It seems best to just to enable it unconditionally for now?
For any multi-replica installation, enabling it for all connections seems to wipe out most of the benefits of connecting to a distributed installation, but I don't see any way around this besides reworking metabase to support separate pools for different connection flavors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're mistaking the behaviour of
select_sequential_consistency
on CH Cloud (docs here), thinking it's the same as on-premise (the link you shared). We're only enablingselect_sequential_consistency
on CH Cloud, where it's much less of an issue.There's more on SharedMergeTree in this blog post.
And I can't imagine fetching metadata from Keeper can be a bottleneck. I also got this DM from Serge on
select_sequential_consistency
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
select_sequential_consistency
is correctly enabled for the Cloud type of instance only.For on-premise clusters (in a follow-up PR), we should just set
insert_quorum
equal to the nodes count so that replica sync won't be necessary and will be effectively done during the CSV insert, and it won't affect other read-only operations.