Skip to content

Add plumbing command for manually creating commit #9142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 24, 2025

Conversation

nicktobey
Copy link
Contributor

@nicktobey nicktobey commented Apr 23, 2025

dolt admin createchunk is a set of undocumented commands for manually writing chunks into the chunkstore. They're a useful tool for writing tests, fixing repos that somehow get into an unexpected state, and hacking on Dolt in ways that the higher-level commands don't support.

dolt admin createchunk commit is the first such command; it creates a commit chunk and prints the new commit hash to standard output.

I picked createchunk committo implement first because it already has utility: it can be used to flatten commit history. Dolt already allows flattening commit history via dolt rebase, but this method introduces a lot of overhead: it walks the commit history to build the rebase table, then walks it again to apply the rebase. For large histories (the kind of histories you might want to squash to save space), this is unacceptably slow.

With this command, flattening a history can be done with dolt admin createchunk commit --root "$rootValueHash" --desc "flattened history" --parents "refs/internal/create" --branch "$branchname" --force

The --branch flag causes the named branch to point to the newly created commit. This flag is mandatory when using the CLI, because the chunk journal is required to end with a root hash and chunks are only flushed to the chunk journal when there's a new root hash.

@nicktobey nicktobey force-pushed the nicktobey/create_commit branch from 5ddcbf7 to b18122e Compare April 23, 2025 01:02
@coffeegoddd
Copy link
Contributor

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
b18122e ok 5937457
version total_tests
b18122e 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
25c15cf ok 5937457
version total_tests
25c15cf 5937457
correctness_percentage
100.0

… it work entirely at the filesystem level. This guarantees that an unauthenticated SQL user can't use the procedure to edit branches (whereas anyone with CLI access is assumed to already have filesystem (and thus admin) access anyway.)
@nicktobey nicktobey force-pushed the nicktobey/create_commit branch from da961f6 to eedaa78 Compare April 23, 2025 19:58
@coffeegoddd
Copy link
Contributor

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
eedaa78 ok 5937457
version total_tests
eedaa78 5937457
correctness_percentage
100.0

…admin command that accesses the storage directly.
@coffeegoddd
Copy link
Contributor

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
9ccd483 ok 5937457
version total_tests
9ccd483 5937457
correctness_percentage
100.0

Copy link
Contributor

@macneale4 macneale4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is some dead code in there. Drop it and ship it!

// Thus, we must update the root hash before the command finishes, or else changes will not be persisted.
type CreateCommitCmd struct{}

func generateCreateCommitSQL(cliCtx cli.CliContext, apr *argparser.ArgParseResults) (query string, params []interface{}, err error) {
Copy link
Contributor

@macneale4 macneale4 Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateCreateCommitSQL is dead code now, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// RequiresRepo should return false if this interface is implemented, and the command does not have the requirement
// that it be run from within a data repository directory
func (cmd CreateCommitCmd) RequiresRepo() bool {
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@coffeegoddd
Copy link
Contributor

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
bab5647 ok 5937457
version total_tests
bab5647 5937457
correctness_percentage
100.0

@nicktobey nicktobey merged commit 94a11de into main Apr 24, 2025
21 checks passed
@nicktobey nicktobey deleted the nicktobey/create_commit branch April 24, 2025 23:46
Copy link

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.07 1.29
batching batch sql 10000 1 0.09 1.33
batching by line sql 10000 1 0.1 1.2
blob 1 blob 200000 1 0.93 3.72 4.57
blob 2 blobs 200000 1 0.9 4.23 4.6
blob no blob 200000 1 0.97 2.35 2.7
col type datetime 200000 1 0.85 2.41 2.76
col type varchar 200000 1 0.81 3.19 3.37
config width 2 cols 200000 1 0.88 2.31 2.67
config width 32 cols 200000 1 1.94 1.96 2.7
config width 8 cols 200000 1 1 2.37 2.65
pk type float 200000 1 0.89 2.34 2.69
pk type int 200000 1 0.84 2.45 2.77
pk type varchar 200000 1 1.5 1.75 1.93
row count 1.6mm 1600000 1 5.89 2.94 3.01
row count 400k 400000 1 1.57 2.69 2.88
row count 800k 800000 1 3.01 2.85 2.91
secondary index four index 200000 1 3.7 1.4 1.34
secondary index no secondary 200000 1 0.96 2.36 2.76
secondary index one index 200000 1 1.17 2.44 2.51
secondary index two index 200000 1 2.06 1.76 1.85
sorting shuffled 1mm 1000000 0 5.7 2.76 2.81
sorting sorted 1mm 1000000 1 5.87 2.68 2.73

Copy link

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.2
dolt_blame_commit_filter system table 2.88
dolt_commit_ancestors_commit_filter system table 0.61
dolt_commits_commit_filter system table 1.1
dolt_diff_log_join_from_commit system table 2.8
dolt_diff_log_join_to_commit system table 2.72
dolt_diff_table_from_commit_filter system table 1.19
dolt_diff_table_to_commit_filter system table 1.21
dolt_diffs_commit_filter system table 1.06
dolt_history_commit_filter system table 1.44
dolt_log_commit_filter system table 1.1

Copy link

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 1.12
adds_updates_deletes 60000 60000 60000 4.56
deletes_only 0 60000 0 2.48
updates_only 0 0 60000 3.04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants