Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement encode/decode for ID columns #730

Open
wants to merge 4 commits into
base: development
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions doc/pg_partman.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,12 +207,12 @@ RETURNS boolean
* An ACCESS EXCLUSIVE lock is taken on the parent table during the running of this function. No data is moved when running this function, so lock should be brief
* A default partition and template table are created by default unless otherwise configured
* `p_parent_table` - the existing parent table. MUST be schema qualified, even if in public schema
* `p_control` - the column that the partitioning will be based on. Must be a time, integer, text or uuid based column. When control is of type text/uuid, p_time_encoder and p_time_decoder must be set.
* `p_control` - the column that the partitioning will be based on. Must be a time, integer, text or UUID based column. When control is of type text/UUID, p_time_encoder and p_time_decoder must be set.
* `p_interval` - the time or integer range interval for each partition. No matter the partitioning type, value must be given as text.
+ *\<interval\>* - Any valid value for the interval data type. Do not type cast the parameter value, just leave as text.
+ *\<integer\>* - For ID based partitions, the integer value range of the ID that should be set per partition. Enter this as an integer in text format ('100' not 100). If the interval is >=2, then the `p_type` must be `range`. If the interval equals 1, then the `p_type` must be `list`. Also note that while numeric values are supported for id-based partitioning, the interval must still be a whole number integer.
* `p_type` - the type of partitioning to be done. Currently only **range** and **list** are supported. See `p_interval` parameter for special conditions concerning type.
* `p_epoch` - tells `pg_partman` that the control column is an integer type, but actually represents and epoch time value. Valid values for this option are: 'seconds', 'milliseconds', 'microseconds', 'nanoseconds', and 'none'. The default is 'none'. All table names will be time-based. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column (to_timestamp(controlcolumn)) as well so this works efficiently.
* `p_epoch` - tells `pg_partman` that the control column is an integer type, but actually represents an epoch time value or integer containing an encoded timestamp. Valid values for this option are: 'seconds', 'milliseconds', 'microseconds', 'nanoseconds', 'func', and 'none'. The default is 'none'. All table names will be time-based. For 'func', encode/decode functions between the integer type used and `timestamptz` are required. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column (to_timestamp(controlcolumn)) as well so this works efficiently.
* `p_premake` - is how many additional partitions to always stay ahead of the current partition. Default value is 4. This will keep at minimum 5 partitions made, including the current one. For example, if today was Sept 6th, and `premake` was set to 4 for a daily partition, then partitions would be made for the 6th as well as the 7th, 8th, 9th and 10th. Note some intervals may occasionally cause an extra partition to be premade or one to be missed due to leap years, differing month lengths, etc. This usually won't hurt anything and should self-correct (see **About** section concerning timezones and non-UTC). If partitioning ever falls behind the `premake` value, normal running of `run_maintenance()` and data insertion should automatically catch things up.
* `p_start_partition` - allows the first partition of a set to be specified instead of it being automatically determined. Must be a valid timestamp (for time-based) or positive integer (for id-based) value. Be aware, though, the actual parameter data type is text. For time-based partitioning, all partitions starting with the given timestamp up to CURRENT_TIMESTAMP (plus `premake`) will be created. For id-based partitioning, only the partition starting at the given value (plus `premake`) will be made. Note that for subpartitioning, this only applies during initial setup and not during ongoing maintenance.
* `p_default_table` - boolean flag to determine whether a default table is created. Defaults to true.
Expand All @@ -222,8 +222,8 @@ RETURNS boolean
* `p_jobmon` - allow `pg_partman` to use the `pg_jobmon` extension to monitor that partitioning is working correctly. Defaults to TRUE.
* `p_date_trunc_interval` - By default, pg_partman's time-based partitioning will truncate the child table starting values to line up at the beginning of typical boundaries (midnight for daily, day 1 for monthly, Jan 1 for yearly, etc). If a partitioning interval that does not fall on those boundaries is desired, this option may be required to ensure the child table has the expected boundaries (especially if you also set `p_start_partition`). The valid values allowed for this parameter are the interval values accepted by PostgreSQL's built-in `date_trunc()` function (day, week, month, etc). For example, if you set a 9-week interval, by default pg_partman would truncate the tables by month (since the interval is greater than one month but less than 1 year) and unexpectedly start on the first of the month in some cases. Set this parameter value to `week`, so that the child table start values are properly truncated on a weekly basis to line up with the 9-week interval. If you are using a custom time interval, please experiment with this option to get the expected set of child tables you desire or use a more typical partitioning interval to simplify partition management.
* `p_control_not_null` - By default, this value is true and the control column must be set to NOT NULL. Setting this to false allows the control column to be NULL. Allowing this is not advised without very careful review and an explicit use-case defined as it can cause excessive data in the DEFAULT child partition.
* `p_time_encoder` - name of function that encodes a timestamp into a string representing your partition bounds. Setting this implicitly enables time based partitioning and is mandatory for text/uuid control column types. This enables partitioning tables using time based identifiers like uuidv7, ulid, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-uuid-daily for usage examples.
* `p_time_decoder` - name of function that decodes a text/uuid control value into a timestamp. Setting this implicitly enables time based partitioning and is mandatory for text/uuid control column types. This enables partitioning tables using time based identifiers like uuidv7, ulid, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-uuid-daily for usage examples.
* `p_time_encoder` - name of function that encodes a `timestamp` into a string or integer representing your partition bounds. Setting this implicitly enables time based partitioning and is mandatory for text/UUID control column types, or integer control column with `p_epoch` = 'func'. This enables partitioning tables using time based identifiers like UUIDv7, ULID, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-UUID-daily for usage examples.
* `p_time_decoder` - name of function that decodes a text/UUID control value into a `timestamptz`. Setting this implicitly enables time based partitioning and is mandatory for text/UUID control column types, or integer control column with `p_epoch` = 'func'. This enables partitioning tables using time based identifiers like UUIDv7, ULID, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-uuid-daily for usage examples.


<a id="create_sub_parent"></a>
Expand Down
102 changes: 102 additions & 0 deletions doc/pg_partman_howto.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Example Guide On Setting Up Native Partitioning
- [Simple Time Based: 1 Partition Per Day](#simple-time-based-1-partition-per-day)
- [Simple Time Based with UUIDv7 type: 1 Partition Per Day](#simple-time-based-with-uuidv7-type-1-partition-per-day)
- [Simple Time Based with Text Type: 1 Partition Per Day](#simple-time-based-with-text-type-1-partition-per-day)
- [Simple Time Based with Snowflake IDs: 1 Partition Per Hour](#simple-time-based-with-snowflake-ids-1-partition-per-hour)
- [Simple Serial ID: 1 Partition Per 10 ID Values](#simple-serial-id-1-partition-Per-10-id-values)
- [Partitioning an Existing Table](#partitioning-an-existing-table)
* [Offline Partitioning](#offline-partitioning)
Expand Down Expand Up @@ -285,6 +286,107 @@ Indexes:
"time_taptest_table_p20240815_pkey" PRIMARY KEY, btree (col3)
Access method: heap
```

### Simple Time Based with Snowflake IDs: 1 Partition Per Hour
This example demonstrates how to use an integer control column that contains integers that encode a timestamp together with other data.

```sql
CREATE SCHEMA IF NOT EXISTS partman_test;

CREATE TABLE partman_test.time_taptest_table(
col1 BIGINT NOT NULL PRIMARY KEY,
col2 text default 'stuff')
PARTITION BY RANGE (col1);
```

```sql
\d+ partman_test.time_taptest_table
Partitioned table "partman_test.time_taptest_table"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------+-----------+----------+---------------+----------+-------------+--------------+-------------
col1 | bigint | | not null | | plain | | |
col2 | text | | | 'stuff'::text | extended | | |
Partition key: RANGE (col1)
Indexes:
"time_taptest_table_pkey" PRIMARY KEY, btree (col1)
Number of partitions: 0
```

Snowflake IDs are used in some distributed systems to generate unique, time-ordered IDs without centralization or coordination between nodes. X, Discord, Mastodon and Instagram are known to use these identifiers, and this example will use [Discord's scheme](https://discord.com/developers/docs/reference#snowflakes). The timestamp is encoded in the top 42 bits of a 64-bit integer, and the rest is for worker data and a counter. Discord also measures time from 2015 UTC instead of the UNIX epoch of 1970 UTC, a gap of 1420070400 seconds. The BIGINT type is limited to 63 bits since it is a signed integer, but 63 bits is sufficient to hold Discord IDs until September 2084.

The following functions respectively encode and decode snowflake IDs from/to timestamps. Note that when encoding the timestamp, the worker/counter bits are zero, so the returned value is useful as a partition boundary, not as a real ID.

```sql
CREATE FUNCTION public.timestamp_to_snowflake(p_timestamp timestamptz, OUT encoded bigint)
RETURNS bigint
LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE
AS $$
BEGIN
SELECT 1000*(EXTRACT(epoch FROM p_timestamp) - 1420070400)::BIGINT << 22 INTO encoded;
END
$$;

CREATE FUNCTION public.snowflake_to_timestamp(p_snowflake bigint, OUT ts timestamptz)
RETURNS TIMESTAMPTZ
LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE
AS $$
BEGIN
SELECT TO_TIMESTAMP((p_snowflake >> 22)/1000 + 1420070400) INTO ts;
END
$$;
```

Now we will instruct partman to use the snowflake encoder and decoder functions with the special value 'func' for `p_epoch`.

```sql
SELECT partman.create_parent('partman_test.time_taptest_table'
, p_control := 'col1'
, p_interval := '1 hour'
, p_epoch := 'func'
, p_time_encoder := 'public.timestamp_to_snowflake'
, p_time_decoder := 'public.snowflake_to_timestamp'
);
create_parent
---------------
t
(1 row)
```

```sql
\d+ partman_test.time_taptest_table
Partitioned table "partman_test.time_taptest_table"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------+-----------+----------+---------------+----------+-------------+--------------+-------------
col1 | bigint | | not null | | plain | | |
col2 | text | | | 'stuff'::text | extended | | |
Partition key: RANGE (col1)
Indexes:
"time_taptest_table_pkey" PRIMARY KEY, btree (col1)
Partitions: partman_test.time_taptest_table_p20250107_030000 FOR VALUES FROM ('1326022498713600000') TO ('1326037598208000000'),
partman_test.time_taptest_table_p20250107_040000 FOR VALUES FROM ('1326037598208000000') TO ('1326052697702400000'),
partman_test.time_taptest_table_p20250107_050000 FOR VALUES FROM ('1326052697702400000') TO ('1326067797196800000'),
partman_test.time_taptest_table_p20250107_060000 FOR VALUES FROM ('1326067797196800000') TO ('1326082896691200000'),
partman_test.time_taptest_table_p20250107_070000 FOR VALUES FROM ('1326082896691200000') TO ('1326097996185600000'),
partman_test.time_taptest_table_p20250107_080000 FOR VALUES FROM ('1326097996185600000') TO ('1326113095680000000'),
partman_test.time_taptest_table_p20250107_090000 FOR VALUES FROM ('1326113095680000000') TO ('1326128195174400000'),
partman_test.time_taptest_table_p20250107_100000 FOR VALUES FROM ('1326128195174400000') TO ('1326143294668800000'),
partman_test.time_taptest_table_p20250107_110000 FOR VALUES FROM ('1326143294668800000') TO ('1326158394163200000'),
partman_test.time_taptest_table_default DEFAULT
```
```sql
\d+ partman_test.time_taptest_table_p20250107_030000
Table "partman_test.time_taptest_table_p20250107_030000"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------+-----------+----------+---------------+----------+-------------+--------------+-------------
col1 | bigint | | not null | | plain | | |
col2 | text | | | 'stuff'::text | extended | | |
Partition of: partman_test.time_taptest_table FOR VALUES FROM ('1326022498713600000') TO ('1326037598208000000')
Partition constraint: ((col1 IS NOT NULL) AND (col1 >= '1326022498713600000'::bigint) AND (col1 < '1326037598208000000'::bigint))
Indexes:
"time_taptest_table_p20250107_030000_pkey" PRIMARY KEY, btree (col1)
Access method: heap
```

### Simple Serial ID: 1 Partition Per 10 ID Values
For this use-case, the template table is not created manually before calling `create_parent()`. So it shows that if a primary/unique key is added later, it does not apply to the currently existing child tables. That will have to be done manually.

Expand Down
81 changes: 81 additions & 0 deletions sql/functions/check_time_encoder_decoder.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
CREATE FUNCTION @extschema@.check_time_encoder_decoder(
p_time_encoder TEXT
, p_time_decoder TEXT
, p_control_type TEXT
, p_start_timestamp TIMESTAMPTZ
)
RETURNS VOID
LANGUAGE plpgsql STABLE
AS $$
DECLARE

v_control_type_oid REGTYPE;
v_i_check_rec RECORD;
v_i_fname_parsed TEXT[];
v_i_found_proc_row pg_catalog.pg_proc%ROWTYPE;
v_i_null_returned_null BOOLEAN;
v_roundtrip_result TIMESTAMPTZ;

BEGIN
/*
* Performs sanity checks on provided time encoder and decoder functions.
*/

SELECT p_control_type::regtype INTO STRICT v_control_type_oid;

FOR v_i_check_rec IN
WITH loops(fname, ftypin, ftypout) AS (VALUES
(p_time_encoder, 'TIMESTAMPTZ'::regtype, v_control_type_oid),
(p_time_decoder, v_control_type_oid, 'TIMESTAMPTZ'::regtype)
)
SELECT * FROM loops
LOOP
-- Make sure the function name is valid and schema qualified
v_i_fname_parsed := parse_ident(v_i_check_rec.fname);
IF cardinality(v_i_fname_parsed) <> 2 THEN
RAISE EXCEPTION 'The function name % is not a valid fully qualified name.', v_i_check_rec.fname;
END IF;

-- Check the functions exist with exact parameter/return types
BEGIN
SELECT * INTO STRICT v_i_found_proc_row
FROM pg_catalog.pg_proc p
JOIN pg_catalog.pg_namespace n ON p.pronamespace = n.oid
WHERE n.nspname = v_i_fname_parsed[1] AND proname = v_i_fname_parsed[2]
AND pronargs >= 1 and proargtypes[0] = v_i_check_rec.ftypin
AND prorettype = v_i_check_rec.ftypout;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RAISE EXCEPTION 'No function named % matching argument % and returning %.', v_i_check_rec.fname, v_i_check_rec.ftypin, ftypout;
WHEN TOO_MANY_ROWS THEN
RAISE EXCEPTION 'Function %(%) -> % is ambiguous, this should not happen.', v_i_check_rec.fname, v_i_check_rec.ftypin, ftypout;
END;

-- Check that the function is declared IMMUTABLE
IF v_i_found_proc_row.provolatile <> 'i' THEN
RAISE EXCEPTION 'Function % must be declared IMMUTABLE (got %).', v_i_check_rec.fname, CASE v_i_found_proc_row.provolatile WHEN 's' THEN 'STABLE' WHEN 'v' THEN 'VOLATILE' END;
END IF;

-- Check that the functions return NULL when passed a NULL
EXECUTE FORMAT('SELECT %s(NULL) IS NULL', v_i_check_rec.fname) INTO v_i_null_returned_null;

IF NOT v_i_null_returned_null THEN
RAISE EXCEPTION 'Function % does not return NULL when called on NULL.', v_i_check_rec.fname;
END IF;

-- Show performance warning for default of PARALLEL UNSAFE
IF v_i_found_proc_row.proparallel <> 's' THEN
RAISE NOTICE 'Function % is not declared parallel safe, this may affect performance if you use it for predicates. See the documentation for CREATE FUNCTION.', v_i_check_rec.fname;
END IF;

END LOOP;

-- test roundtrip
EXECUTE FORMAT('SELECT %s(%s(%L))', p_time_decoder, p_time_encoder, p_start_timestamp) INTO v_roundtrip_result;

IF p_start_timestamp <> v_roundtrip_result THEN
RAISE EXCEPTION 'Encoding and then decoding the start of the first partition range (%) got a different value (%). Make sure the encoding is correct and aligns with the partition interval.', p_start_timestamp, v_roundtrip_result;
END IF;

END
$$;
8 changes: 7 additions & 1 deletion sql/functions/create_parent.sql
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,9 @@ IF v_control_type NOT IN ('time', 'id', 'text', 'uuid') THEN
RAISE EXCEPTION 'Only date/time, text/uuid or integer types are allowed for the control column.';
ELSIF v_control_type IN ('text', 'uuid') AND (p_time_encoder IS NULL OR p_time_decoder IS NULL) THEN
RAISE EXCEPTION 'p_time_encoder and p_time_decoder needs to be set for text/uuid type control column.';
ELSIF v_control_type NOT IN ('text', 'uuid') AND (p_time_encoder IS NOT NULL OR p_time_decoder IS NOT NULL) THEN
ELSIF v_control_type = 'id' AND p_epoch = 'func' AND (p_time_encoder IS NULL OR p_time_decoder IS NULL) THEN
RAISE EXCEPTION 'p_time_encoder and p_time_decoder functions need to be set for p_epoch=func to work.';
ELSIF v_control_type NOT IN ('text', 'uuid', 'id') AND (p_time_encoder IS NOT NULL OR p_time_decoder IS NOT NULL) THEN
RAISE EXCEPTION 'p_time_encoder and p_time_decoder can only be used with text/uuid type control column.';
END IF;

Expand Down Expand Up @@ -346,6 +348,10 @@ IF v_control_type IN ('time', 'text', 'uuid') OR (v_control_type = 'id' AND p_ep
INTO v_base_timestamp, v_datetime_string
FROM @extschema@.calculate_time_partition_info(v_time_interval, v_start_time, p_date_trunc_interval);

IF p_epoch = 'func' THEN
PERFORM @extschema@.check_time_encoder_decoder(p_time_encoder, p_time_decoder, v_control_exact_type, v_base_timestamp);
END IF;

RAISE DEBUG 'create_parent(): parent_table: %, v_base_timestamp: %', p_parent_table, v_base_timestamp;

v_partition_time_array := array_append(v_partition_time_array, v_base_timestamp);
Expand Down
Loading