Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DB backup use Zstandard compression #2248

Merged
merged 1 commit into from
Dec 17, 2024
Merged

Conversation

mikerkelly
Copy link
Contributor

@mikerkelly mikerkelly commented Dec 16, 2024

Partially fixes #2151.

Zstandard is a fast, modern, lossless data compression algorithm. For these backup files, it gives marginally better compression ratios than gzip and much faster compression and particularly decompression. We want the backup process to be quick as it's a CPU-intensive activity that could affect site performance.

Experimental comparison of different compression utilities with their default settings:

Compression Test Results for 'latest-db.sqlite3':
Method | Compression Time | Original Size  | Compressed Size  | Compression Ratio
-------|------------------|----------------|------------------|------------------
gzip   | Time: 272.43 s   | Original: 4.1G | Compressed: 1.1G | Ratio: 73%
XZ     | Time: 3151.21s   | Original: 4.1G | Compressed: 606M | Ratio: 86%
7Z     | Time: 985.42 s   | Original: 4.1G | Compressed: 616M | Ratio: 86%
BZIP2  | Time: 462.39 s   | Original: 4.1G | Compressed: 840M | Ratio: 80%
ZSTD   | Time:  30.14 s   | Original: 4.1G | Compressed: 1.1G | Ratio: 75%

Zstandard is the clear best balance between speed and compression ratio. Any attempt to increase the compression ratio further, including flags to zstd, slowed the process significantly. If we were concerned about backup local space usage we should consider using additional or external storage, incremental backups, or a separate archival process.


This was executed on dokku3 out of business hours. Throwaway script attached:
script.txt

Zstandard is a fast, modern, lossless data compression algorithm. For these
backup files, it gives marginally better compression ratios than `gzip` and much
faster compression and particularly decompression. We want the backup process
to be quick as it's a CPU-intensive activity that could affect site performance.

Experimental comparison of different compression utilities with their default settings:

```
Compression Test Results for 'latest-db.sqlite3':
Method | Compression Time | Original Size  | Compressed Size  | Compression Ratio
-------|------------------|----------------|------------------|------------------
gzip   | Time: 272.43 s   | Original: 4.1G | Compressed: 1.1G | Ratio: 73%
XZ     | Time: 3151.21s   | Original: 4.1G | Compressed: 606M | Ratio: 86%
7Z     | Time: 985.42 s   | Original: 4.1G | Compressed: 616M | Ratio: 86%
BZIP2  | Time: 462.39 s   | Original: 4.1G | Compressed: 840M | Ratio: 80%
ZSTD   | Time:  30.14 s   | Original: 4.1G | Compressed: 1.1G | Ratio: 75%
```

Zstandard is the clear best balance between speed and compression ratio. Any
attempt to increase the compression ratio further, including flags to zstd,
slowed the process significantly. If we were concerned about backup local space
usage we should consider using additional or external storage, incremental
backups, or a separate archival process.
@mikerkelly mikerkelly self-assigned this Dec 16, 2024
@mikerkelly mikerkelly merged commit 75dee86 into main Dec 17, 2024
6 checks passed
@mikerkelly mikerkelly deleted the mikerkelly/backup-db-zstd branch December 17, 2024 08:20
mikerkelly added a commit that referenced this pull request Feb 5, 2025
These lines were gradually removing old-format backups as they aged out past 30
days. Now there are none left, they do no useful work and are no longer
required.

We changed the format of backups from `.json` to `.sqlite3` in #2214
(cdbacb6)  on Dec 9, 2024.

We changed the compression format from `gzip` to `Zstandard` in #2248
(7984176) on Dec 16, 2024.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use sqlite cli to backup core database
2 participants