Skip to content

Refactor dumps code #3278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 22, 2025
Merged

Refactor dumps code #3278

merged 7 commits into from
May 22, 2025

Conversation

amCap1712
Copy link
Member

Refactor dumps code for better readability and reducing redundancy. Individual commits explain the changes in more detail.

amCap1712 added 6 commits May 15, 2025 18:29
Only difference is format text vs csv.
Refactor the common dump setup code of setting up zstd compression, archive creation,
temporary directory management and metadata writing into two context managers: `zstd_dump`
and `uncompressed_dump`.
Replaced table dicts with a `DumpTable` model and a `DumpTablesCollection` class.
This helps reduce redundancy in postgres + timescale dumps and mapping dumps.
Consolidate duplicate logic for PostgreSQL and TimescaleDB dumps into a single
`dump_database` function that manages the creation of public and private dumps.
@amCap1712 amCap1712 requested a review from mayhem May 15, 2025 13:03
Copy link
Member

@mayhem mayhem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good improvement -- I like how all the details of the dump are specified in one location now. Perhaps we can do the playlist improvement at a future point in time.


archive_name = 'musicbrainz-canonical-dump-{time}'.format(
time=dump_time.strftime('%Y%m%d-%H%M%S')
MAPPING_TABLES = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh, very nice!


return escaped_table_name, joined_fields
PRIVATE_TABLES_TIMESCALE = DumpTablesCollection(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If could wish for one thing more that this table dumper could do, is to separate out public playlists from private playlists and put them into the dumps accordingly.

@amCap1712 amCap1712 merged commit 9b7758c into master May 22, 2025
2 checks passed
@amCap1712 amCap1712 deleted the refactor-dumps-2 branch May 22, 2025 05:42
Copy link

sentry-io bot commented May 23, 2025

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ TypeError: Composed elements must be Composable, got 'release_group_mbid' instead listenbrainz.dumps.sample in get_popularity_data View Issue
  • ‼️ SyntaxError: syntax error at end of input listenbrainz.dumps.sample in get_popularity_data View Issue
  • ‼️ TypeError: unsupported operand type(s) for -: 'odict_iterator' and 'list' listenbrainz.dumps.sample in dump_sample_data View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants