Add missing lock to Constraint-aware append #7515

erimatnor · 2024-12-04T09:48:33Z

During parallel scans of Constraint-aware append, it seems like runtime chunk exclusion happens in every parallel worker. However, parallel workers don't grab a lock on a relation before calling relation_excluded_by_constraints(), which makes them hit an assertion. An AccessShareLock or higher must be held on the relation.

Ideally, runtime chunk exclusion should happen only once in the parallel worker "leader", but that requires a bigger refactor left for the future.

erimatnor · 2024-12-04T09:50:56Z

Unfortunately, the assertion failure is not easy to reproduce in a test because it doesn't happen all the time. It typically requires a clean-slate session. Might spend some more time doing a test for it later.

codecov · 2024-12-04T10:00:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.31%. Comparing base (1a08294) to head (76bbbca).
Report is 13 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7515      +/-   ##
==========================================
+ Coverage   82.25%   82.31%   +0.05%     
==========================================
  Files         254      254              
  Lines       47642    47615      -27     
  Branches    12005    12001       -4     
==========================================
+ Hits        39186    39192       +6     
- Misses       3650     3667      +17     
+ Partials     4806     4756      -50

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/nodes/constraint_aware_append/constraint_aware_append.c

erimatnor · 2024-12-10T12:05:54Z

Here's a script that reproduces the assertion crash:

select setseed(0.2);

create table readings(time timestamptz, device int, temp float);

select create_hypertable('readings', 'time', create_default_indexes => false);

insert into readings
select t, ceil(random()*10), random()*40
from generate_series('2022-06-01'::timestamptz, '2022-06-20 00:01:00', '1s') t;

alter table readings set (
      timescaledb.compress,
      timescaledb.compress_orderby = 'time',
      timescaledb.compress_segmentby = 'device'
);

--insert into readings values ('2022-06-01', 1, 1.0), ('2022-06-02', 2, 2.0), ('2022-08-02', 3, 3.0);

create index on readings (time);

select format('%I.%I', chunk_schema, chunk_name)::regclass as chunk
  from timescaledb_information.chunks
 where format('%I.%I', hypertable_schema, hypertable_name)::regclass = 'readings'::regclass
 limit 1 \gset

set timescaledb.enable_chunk_append=off;

select compress_chunk(ch) from show_chunks('readings') ch;
select decompress_chunk('_timescaledb_internal._hyper_1_3_chunk');

--set timescaledb.enable_chunk_append=on;

explain (analyze)
select time, avg(temp), device  from readings
where time  > now() - interval '2 years 5 months 20 days' group by time, device order by time;

erimatnor · 2024-12-10T13:27:47Z

@akuzm After some more investigation it seems this is related to parallel query plans, where the spawned workers don't have the relation locked. I don't know if parallel workers are expected to have relations referenced in plan nodes locked at the beginning of plan execution.

akuzm · 2024-12-10T14:42:06Z

@akuzm After some more investigation it seems this is related to parallel query plans, where the spawned workers don't have the relation locked. I don't know if parallel workers are expected to have relations referenced in plan nodes locked at the beginning of plan execution.

Not sure, e.g. the parallel seq scans seemingly doesn't lock tables in the workers, athough I only looked through the code and didn't check. In ChunkAppend, we ultimately switched to performing chunk exclusion in the parallel leader. Maybe it makes sense to do the same here as well. #5857

erimatnor · 2024-12-11T08:37:37Z

@akuzm After some more investigation it seems this is related to parallel query plans, where the spawned workers don't have the relation locked. I don't know if parallel workers are expected to have relations referenced in plan nodes locked at the beginning of plan execution.

Not sure, e.g. the parallel seq scans seemingly doesn't lock tables in the workers, athough I only looked through the code and didn't check. In ChunkAppend, we ultimately switched to performing chunk exclusion in the parallel leader. Maybe it makes sense to do the same here as well. #5857

If we can do chunk exclusion only once it is better. But TBH, I am not sure it is worth the effort given how seldom this scan node is used, so for now I think just taking this lock in the worker is enough. We can always implement leader exclusion later.

erimatnor · 2024-12-20T08:10:31Z

@akuzm Are we good to proceed with this fix. It seems the lack of locking is only in parallel queries so it seems correct to grab the lock if it is not already taken. Also seems like a very low-risk fix.

akuzm · 2024-12-23T14:07:50Z

@akuzm Are we good to proceed with this fix. It seems the lack of locking is only in parallel queries so it seems correct to grab the lock if it is not already taken. Also seems like a very low-risk fix.

Sure. Let's move the lock before constify_restrictinfos, that's a complicated function that is usually called during planning, it might have some obscure path that requires a lock as well.

erimatnor · 2025-02-10T12:40:22Z

@akuzm Are we good to proceed with this fix. It seems the lack of locking is only in parallel queries so it seems correct to grab the lock if it is not already taken. Also seems like a very low-risk fix.

Sure. Let's move the lock before constify_restrictinfos, that's a complicated function that is usually called during planning, it might have some obscure path that requires a lock as well.

Moved the lock to before constify_restrictinfos.

mkindahl

LGTM

mkindahl · 2025-04-01T07:58:19Z

src/nodes/constraint_aware_append/constraint_aware_append.c

+				 * where the parallel worker hasn't locked it yet.
+				 */
+				rte = rt_fetch(scanrelid, estate->es_range_table);
+				LockRelationOid(rte->relid, AccessShareLock);


If this lock is already taken in the same session, it will increase the lock count, so there should be no error as a result of taking the lock twice (or more).

The lock will also automatically be released at the end of the transaction, or when the process shuts down, so there is no risk of a dangling lock.

During parallel scans of Constraint-aware append, it seems like runtime chunk exclusion happens in every parallel worker. However, parallel workers don't grab a lock on a relation before calling relation_excluded_by_constraints(), which makes them hit an assertion. An AccessShareLock or higher must be held on the relation. Ideally, runtime chunk exclusion should happen only once in the parallel worker "leader", but that requires a bigger refactor left for the future.

#8105) This is an automated backport of #7515: Add missing lock to Constraint-aware append. This PR will be merged automatically after all the relevant CI checks pass. If this fix should not be backported, or will be backported manually, just close this PR. You can use the backport branch to add your changes, it won't be modified automatically anymore. For more details, please see the [documentation](https://github.com/timescale/eng-database/wiki/Releasing-TimescaleDB#automated-cherry-picking-of-bug-fixes) ## Original description ### Add missing lock to Constraint-aware append During parallel scans of Constraint-aware append, it seems like runtime chunk exclusion happens in every parallel worker. However, parallel workers don't grab a lock on a relation before calling relation_excluded_by_constraints(), which makes them hit an assertion. An AccessShareLock or higher must be held on the relation. Ideally, runtime chunk exclusion should happen only once in the parallel worker "leader", but that requires a bigger refactor left for the future. Co-authored-by: Erik Nordström <erik@timescale.com>

timescale-automation · 2025-05-12T16:20:58Z

Automated backport to 2.20.x not done: cherry-pick failed.

Git status

HEAD detached at origin/2.20.x
You are currently cherry-picking commit 2a539e2b7.
  (all conflicts fixed: run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

nothing to commit, working tree clean

Job log

@arajkumar

## 2.20.0 (2025-05-15) This release contains performance improvements and bug fixes since the 2.19.3 release. We recommend that you upgrade at the next available opportunity. **Highlighted features in TimescaleDB v2.20.0** * The columnstore now leverages *bloom filters* to deliver up to 6x faster point queries on columns with high cardinality values, such as UUIDs. * Major *improvements to the columnstores' backfill process* enable `UPSERTS` with strict constraints to execute up to 10x faster. * *SkipScan is now supported in the columnstore*, including for DISTINCT queries. This enhancement leads to dramatic query performance improvements of 2000x to 2500x, especially for selective queries. * SIMD vectorization for the bool data type is now enabled by default. This change results in a 30–45% increase in performance for analytical queries with bool clauses on the columnstore. * *Continuous aggregates* now include experimental support for *window functions and non-immutable functions*, extending the analytics use cases they can solve. * Several quality-of-life improvements have been introduced: job names for continuous aggregates are now more descriptive, you can assign custom names to them, and it is now possible to add unique constraints along with `ADD COLUMN` operations in the columnstore. * Improved management and optimization of chunks with the ability to split large uncompressed chunks at a specified point in time using the `split_chunk` function. This new function complements the existing `merge_chunk` function that can be used to merge two small chunks into one larger chunk. * Enhancements to the default behavior of the columnstore now provide better *automatic assessments* of `segment by` and `order by` columns, reducing the need for manual configuration and simplifying initial setup. **PostgreSQL 14 support removal announcement** Following the deprecation announcement for PostgreSQL 14 in TimescaleDB v2.19.0, PostgreSQL 14 is no longer supported in TimescaleDB v2.20.0. The currently supported PostgreSQL major versions are 15, 16, and 17. **Features** * [#7638](#7638) Bloom filter sparse indexes for compressed columns. Can be disabled with the GUC `timescaledb.enable_sparse_index_bloom` * [#7756](#7756) Add warning for poor compression ratio * [#7762](#7762) Speed up the queries that use minmax sparse indexes on compressed tables by changing the index TOAST storage type to `MAIN`. This applies to newly compressed chunks * [#7785](#7785) Do `DELETE` instead of `TRUNCATE` when locks aren't acquired * [#7852](#7852) Allow creating foreign key constraints on compressed tables * [#7854](#7854) Remove support for PG14 * [#7864](#7854) Allow adding CHECK constraints to compressed chunks * [#7868](#7868) Allow adding columns with `CHECK` constraints to compressed chunks * [#7874](#7874) Support for SkipScan for distinct aggregates over the same column * [#7877](#7877) Remove blocker for unique constraints with `ADD COLUMN` * [#7878](#7878) Don't block non-immutable functions in continuous aggregates * [#7880](#7880) Add experimental support for window functions in continuous aggregates * [#7899](#7899) Vectorized decompression and filtering for boolean columns * [#7915](#7915) New option `refresh_newest_first` to continuous aggregate refresh policy API * [#7917](#7917) Remove `_timescaledb_functions.create_chunk_table` function * [#7929](#7929) Add `CREATE TABLE ... WITH` API for creating hypertables * [#7946](#7946) Add support for splitting a chunk * [#7958](#7958) Allow custom names for jobs * [#7972](#7972) Add vectorized filtering for constraint checking while backfilling into compressed chunks * [#7976](#7976) Include continuous aggregate name in jobs informational view * [#7977](#7977) Replace references to compression with columnstore * [#7981](#7981) Add columnstore as alias for `enable_columnstore `in `ALTER TABLE` * [#7983](#7983) Support for SkipScan over compressed data * [#7991](#7991) Improves default `segmentby` options * [#7992](#7992) Add API into hypertable invalidation log * [#8000](#8000) Add primary dimension info to information schema * [#8005](#8005) Support `ALTER TABLE SET (timescaledb.chunk_time_interval='1 day')` * [#8012](#8012) Add event triggers support on chunk creation * [#8014](#8014) Enable bool compression by default by setting `timescaledb.enable_bool_compression=true`. Note: for downgrading to `2.18` or earlier version, use [this downgrade script](https://github.com/timescale/timescaledb-extras/blob/master/utils/2.19.0-downgrade_new_compression_algorithms.sql) * [#8018](#8018) Add spin-lock during recompression on unique constraints * [#8026](#8026) Allow `WHERE` conditions that use nonvolatile functions to be pushed down to the compressed scan level. For example, conditions like `time > now()`, where `time` is a columnstore `orderby` column, will evaluate `now()` and use the sparse index on `time` to filter out the entire compressed batches that cannot contain matching rows. * [#8027](#8027) Add materialization invalidations API * [#8047](#8027) Support SkipScan for `SELECT DISTINCT` with multiple distincts when all but one distinct is pinned * [#8115](#8115) Add batch size limiting during compression **Bugfixes** * [#7862](#7862) Release cache pin when checking for `NOT NULL` * [#7909](#7909) Update compression stats when merging chunks * [#7928](#7928) Don't create a hypertable for implicitly published tables * [#7982](#7982) Fix crash in batch sort merge over eligible expressions * [#8008](#8008) Fix compression policy error message that shows number of successes * [#8031](#8031) Fix reporting of deleted tuples for direct batch delete * [#8033](#8033) Skip default `segmentby` if `orderby` is explicitly set * [#8061](#8061) Ensure settings for a compressed relation are found * [#7515](#7515) Add missing lock to Constraint-aware append * [#8067](#8067) Make sure hypercore TAM parent is vacuumed * [#8074](#8074) Fix memory leak in row compressor flush * [#8099](#8099) Block chunk merging on multi-dimensional hypertables * [#8106](#8106) Fix segfault when adding unique compression indexes to compressed chunks * [#8127](#8127) Read bit-packed version of booleans **GUCs** * `timescaledb.enable_sparse_index_bloom`: Enable creation of the bloom1 sparse index on compressed chunks; Default: `ON` * `timescaledb.compress_truncate_behaviour`: Defines how truncate behaves at the end of compression; Default: `truncate_only` * `timescaledb.enable_compression_ratio_warnings`: Enable warnings for poor compression ratio; Default: `ON` * `timescaledb.enable_event_triggers`: Enable event triggers for chunks creation; Default: `OFF` * `timescaledb.enable_cagg_window_functions`: Enable window functions in continuous aggregates; Default: `OFF` **Thanks** * @arajkumar for reporting that implicitly published tables were still able to create hypertables * @thotokraa for reporting an issue with unique expression indexes on compressed chunks --------- Signed-off-by: Philip Krauss <35487337+philkra@users.noreply.github.com> Signed-off-by: Ramon Guiu <ramon@timescale.com> Co-authored-by: Anastasiia Tovpeko <114177030+atovpeko@users.noreply.github.com> Co-authored-by: Ramon Guiu <ramon@timescale.com>

erimatnor added the bug label Dec 4, 2024

akuzm reviewed Dec 4, 2024

View reviewed changes

src/nodes/constraint_aware_append/constraint_aware_append.c Outdated Show resolved Hide resolved

fabriziomello assigned erimatnor Dec 6, 2024

erimatnor requested a review from akuzm December 20, 2024 08:09

erimatnor force-pushed the fix-constraint-aware-append-missing-lock branch from 01b22dd to bb9b0f2 Compare February 10, 2025 12:39

erimatnor force-pushed the fix-constraint-aware-append-missing-lock branch 2 times, most recently from 7015f2d to e047e43 Compare February 10, 2025 13:02

akuzm approved these changes Feb 10, 2025

View reviewed changes

erimatnor requested a review from fabriziomello February 12, 2025 07:34

erimatnor force-pushed the fix-constraint-aware-append-missing-lock branch from f390b0c to a61d59f Compare February 21, 2025 08:09

erimatnor requested a review from mkindahl March 13, 2025 09:41

philkra added the icebox PR that is stashed in the icebox label Mar 20, 2025

erimatnor force-pushed the fix-constraint-aware-append-missing-lock branch from a61d59f to 889c695 Compare April 1, 2025 07:43

mkindahl approved these changes Apr 1, 2025

View reviewed changes

erimatnor force-pushed the fix-constraint-aware-append-missing-lock branch from 889c695 to 80650ca Compare April 1, 2025 09:20

erimatnor enabled auto-merge (rebase) April 1, 2025 09:20

akuzm added 2 commits May 9, 2025 17:33

Merge branch 'main' into fix-constraint-aware-append-missing-lock

4645ba4

Merge branch 'main' into fix-constraint-aware-append-missing-lock

76bbbca

erimatnor merged commit 2a539e2 into timescale:main May 9, 2025
40 of 41 checks passed

timescale-automation mentioned this pull request May 9, 2025

Backport to 2.20.x: #7515: Add missing lock to Constraint-aware append #8105

Merged

timescale-automation added the auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) label May 12, 2025

philkra mentioned this pull request May 13, 2025

changelog 2.20.0 #8124

Merged

bayandin mentioned this pull request May 15, 2025

timescaledb 2.20.0 bayandin/homebrew-tap#280

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add missing lock to Constraint-aware append #7515

Add missing lock to Constraint-aware append #7515

Uh oh!

erimatnor commented Dec 4, 2024 •

edited

Loading

Uh oh!

erimatnor commented Dec 4, 2024 •

edited

Loading

Uh oh!

codecov bot commented Dec 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

erimatnor commented Dec 10, 2024

Uh oh!

erimatnor commented Dec 10, 2024 •

edited

Loading

Uh oh!

akuzm commented Dec 10, 2024

Uh oh!

erimatnor commented Dec 11, 2024

Uh oh!

erimatnor commented Dec 20, 2024

Uh oh!

akuzm commented Dec 23, 2024

Uh oh!

erimatnor commented Feb 10, 2025

Uh oh!

mkindahl left a comment

Uh oh!

mkindahl Apr 1, 2025

Uh oh!

Uh oh!

timescale-automation commented May 12, 2025

Uh oh!

Uh oh!

Add missing lock to Constraint-aware append #7515

Add missing lock to Constraint-aware append #7515

Uh oh!

Conversation

erimatnor commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erimatnor commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

erimatnor commented Dec 10, 2024

Uh oh!

erimatnor commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akuzm commented Dec 10, 2024

Uh oh!

erimatnor commented Dec 11, 2024

Uh oh!

erimatnor commented Dec 20, 2024

Uh oh!

akuzm commented Dec 23, 2024

Uh oh!

erimatnor commented Feb 10, 2025

Uh oh!

mkindahl left a comment

Choose a reason for hiding this comment

Uh oh!

mkindahl Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timescale-automation commented May 12, 2025

Git status

Uh oh!

Uh oh!

erimatnor commented Dec 4, 2024 •

edited

Loading

erimatnor commented Dec 4, 2024 •

edited

Loading

codecov bot commented Dec 4, 2024 •

edited

Loading

erimatnor commented Dec 10, 2024 •

edited

Loading