Implement encode/decode for ID columns + tests, docs

This PR allows users to specify the special value of 'func' for p_epoch to use custom functions to encode/decode time-ordered integers other than the classic seconds, ms, us or ns since the UNIX epoch. Resolves pgpartman#729.
calebj · Jan 7, 2025 · 64940d8 · 64940d8
1 parent 949bf56
commit 64940d8
Show file tree

Hide file tree

Showing 15 changed files with 2,819 additions and 9 deletions.
diff --git a/doc/pg_partman.md b/doc/pg_partman.md
@@ -207,12 +207,12 @@ RETURNS boolean
  * An ACCESS EXCLUSIVE lock is taken on the parent table during the running of this function. No data is moved when running this function, so lock should be brief
  * A default partition and template table are created by default unless otherwise configured
  * `p_parent_table` - the existing parent table. MUST be schema qualified, even if in public schema
- * `p_control` - the column that the partitioning will be based on. Must be a time, integer, text or uuid based column. When control is of type text/uuid, p_time_encoder and p_time_decoder must be set.
+ * `p_control` - the column that the partitioning will be based on. Must be a time, integer, text or UUID based column. When control is of type text/UUID, p_time_encoder and p_time_decoder must be set.
  * `p_interval` - the time or integer range interval for each partition. No matter the partitioning type, value must be given as text.
     + *\<interval\>*      - Any valid value for the interval data type. Do not type cast the parameter value, just leave as text.
     + *\<integer\>*       - For ID based partitions, the integer value range of the ID that should be set per partition. Enter this as an integer in text format ('100' not 100). If the interval is >=2, then the `p_type` must be `range`. If the interval equals 1, then the `p_type` must be `list`. Also note that while numeric values are supported for id-based partitioning, the interval must still be a whole number integer.
  * `p_type` - the type of partitioning to be done. Currently only **range** and **list** are supported. See `p_interval` parameter for special conditions concerning type.
- * `p_epoch` - tells `pg_partman` that the control column is an integer type, but actually represents and epoch time value. Valid values for this option are: 'seconds', 'milliseconds', 'microseconds', 'nanoseconds', and 'none'. The default is 'none'. All table names will be time-based. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column (to_timestamp(controlcolumn)) as well so this works efficiently.
+ * `p_epoch` - tells `pg_partman` that the control column is an integer type, but actually represents an epoch time value or integer containing an encoded timestamp. Valid values for this option are: 'seconds', 'milliseconds', 'microseconds', 'nanoseconds', 'func', and 'none'. The default is 'none'. All table names will be time-based. For 'func', encode/decode functions between the integer type used and `timestamptz` are required. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column (to_timestamp(controlcolumn)) as well so this works efficiently.
  * `p_premake` - is how many additional partitions to always stay ahead of the current partition. Default value is 4. This will keep at minimum 5 partitions made, including the current one. For example, if today was Sept 6th, and `premake` was set to 4 for a daily partition, then partitions would be made for the 6th as well as the 7th, 8th, 9th and 10th. Note some intervals may occasionally cause an extra partition to be premade or one to be missed due to leap years, differing month lengths, etc. This usually won't hurt anything and should self-correct (see **About** section concerning timezones and non-UTC). If partitioning ever falls behind the `premake` value, normal running of `run_maintenance()` and data insertion should automatically catch things up.
  * `p_start_partition` - allows the first partition of a set to be specified instead of it being automatically determined. Must be a valid timestamp (for time-based) or positive integer (for id-based) value. Be aware, though, the actual parameter data type is text. For time-based partitioning, all partitions starting with the given timestamp up to CURRENT_TIMESTAMP (plus `premake`) will be created. For id-based partitioning, only the partition starting at the given value (plus `premake`) will be made. Note that for subpartitioning, this only applies during initial setup and not during ongoing maintenance.
  * `p_default_table` - boolean flag to determine whether a default table is created. Defaults to true.
@@ -222,8 +222,8 @@ RETURNS boolean
  * `p_jobmon` - allow `pg_partman` to use the `pg_jobmon` extension to monitor that partitioning is working correctly. Defaults to TRUE.
  * `p_date_trunc_interval` - By default, pg_partman's time-based partitioning will truncate the child table starting values to line up at the beginning of typical boundaries (midnight for daily, day 1 for monthly, Jan 1 for yearly, etc). If a partitioning interval that does not fall on those boundaries is desired, this option may be required to ensure the child table has the expected boundaries (especially if you also set `p_start_partition`). The valid values allowed for this parameter are the interval values accepted by PostgreSQL's built-in `date_trunc()` function (day, week, month, etc). For example, if you set a 9-week interval, by default pg_partman would truncate the tables by month (since the interval is greater than one month but less than 1 year) and unexpectedly start on the first of the month in some cases. Set this parameter value to `week`, so that the child table start values are properly truncated on a weekly basis to line up with the 9-week interval. If you are using a custom time interval, please experiment with this option to get the expected set of child tables you desire or use a more typical partitioning interval to simplify partition management.
  * `p_control_not_null` - By default, this value is true and the control column must be set to NOT NULL. Setting this to false allows the control column to be NULL. Allowing this is not advised without very careful review and an explicit use-case defined as it can cause excessive data in the DEFAULT child partition.
- * `p_time_encoder` - name of function that encodes a timestamp into a string representing your partition bounds. Setting this implicitly enables time based partitioning and is mandatory for text/uuid control column types. This enables partitioning tables using time based identifiers like uuidv7, ulid, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-uuid-daily for usage examples.
- * `p_time_decoder` - name of function that decodes a text/uuid control value into a timestamp. Setting this implicitly enables time based partitioning and is mandatory for text/uuid control column types. This enables partitioning tables using time based identifiers like uuidv7, ulid, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-uuid-daily for usage examples.
+ * `p_time_encoder` - name of function that encodes a `timestamp` into a string or integer representing your partition bounds. Setting this implicitly enables time based partitioning and is mandatory for text/UUID control column types, or integer control column with `p_epoch` = 'func'. This enables partitioning tables using time based identifiers like UUIDv7, ULID, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-UUID-daily for usage examples.
+ * `p_time_decoder` - name of function that decodes a text/UUID control value into a `timestamptz`. Setting this implicitly enables time based partitioning and is mandatory for text/UUID control column types, or integer control column with `p_epoch` = 'func'. This enables partitioning tables using time based identifiers like UUIDv7, ULID, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-uuid-daily for usage examples.
 
 
 <a id="create_sub_parent"></a>

diff --git a/doc/pg_partman_howto.md b/doc/pg_partman_howto.md
@@ -4,6 +4,7 @@ Example Guide On Setting Up Native Partitioning
 - [Simple Time Based: 1 Partition Per Day](#simple-time-based-1-partition-per-day)
 - [Simple Time Based with UUIDv7 type: 1 Partition Per Day](#simple-time-based-with-uuidv7-type-1-partition-per-day)
 - [Simple Time Based with Text Type: 1 Partition Per Day](#simple-time-based-with-text-type-1-partition-per-day)
+- [Simple Time Based with Snowflake IDs: 1 Partition Per Hour](#simple-time-based-with-snowflake-ids-1-partition-per-hour)
 - [Simple Serial ID: 1 Partition Per 10 ID Values](#simple-serial-id-1-partition-Per-10-id-values)
 - [Partitioning an Existing Table](#partitioning-an-existing-table)
   * [Offline Partitioning](#offline-partitioning)
@@ -285,6 +286,107 @@ Indexes:
     "time_taptest_table_p20240815_pkey" PRIMARY KEY, btree (col3)
 Access method: heap
 ```
+
+### Simple Time Based with Snowflake IDs: 1 Partition Per Hour
+This example demonstrates how to use an integer control column that contains integers that encode a timestamp together with other data.
+
+```sql
+CREATE SCHEMA IF NOT EXISTS partman_test;
+
+CREATE TABLE partman_test.time_taptest_table(
+    col1 BIGINT NOT NULL PRIMARY KEY,
+    col2 text default 'stuff')
+PARTITION BY RANGE (col1);
+```
+
+```sql
+\d+ partman_test.time_taptest_table
+                             Partitioned table "partman_test.time_taptest_table"
+ Column |  Type  | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
+--------+--------+-----------+----------+---------------+----------+-------------+--------------+-------------
+ col1   | bigint |           | not null |               | plain    |             |              |
+ col2   | text   |           |          | 'stuff'::text | extended |             |              |
+Partition key: RANGE (col1)
+Indexes:
+    "time_taptest_table_pkey" PRIMARY KEY, btree (col1)
+Number of partitions: 0
+```
+
+Snowflake IDs are used in some distributed systems to generate unique, time-ordered IDs without centralization or coordination between nodes. X, Discord, Mastodon and Instagram are known to use these identifiers, and this example will use [Discord's scheme](https://discord.com/developers/docs/reference#snowflakes). The timestamp is encoded in the top 42 bits of a 64-bit integer, and the rest is for worker data and a counter. Discord also measures time from 2015 UTC instead of the UNIX epoch of 1970 UTC, a gap of 1420070400 seconds. The BIGINT type is limited to 63 bits since it is a signed integer, but 63 bits is sufficient to hold Discord IDs until September 2084.
+
+The following functions respectively encode and decode snowflake IDs from/to timestamps. Note that when encoding the timestamp, the worker/counter bits are zero. so the returned value is useful as a partition boundary, not as a real ID.
+
+```sql
+CREATE FUNCTION public.timestamp_to_snowflake(p_timestamp timestamptz, OUT encoded bigint)
+    RETURNS bigint
+    LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE
+    AS $$
+BEGIN
+    SELECT 1000*(EXTRACT(epoch FROM p_timestamp) - 1420070400)::BIGINT << 22 INTO encoded;
+END
+$$;
+
+CREATE FUNCTION public.snowflake_to_timestamp(p_snowflake bigint, OUT ts timestamptz)
+    RETURNS TIMESTAMPTZ
+    LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE
+    AS $$
+BEGIN
+    SELECT TO_TIMESTAMP((p_snowflake >> 22)/1000 + 1420070400) INTO ts;
+END
+$$;
+```
+
+Now we will instruct partman to use the snowflake encoder and decoder functions with the special value 'func' for `p_epoch`.
+
+```sql
+SELECT partman.create_parent('partman_test.time_taptest_table'
+    , p_control      := 'col1'
+    , p_interval     := '1 hour'
+    , p_epoch        := 'func'
+    , p_time_encoder := 'public.timestamp_to_snowflake'
+    , p_time_decoder := 'public.snowflake_to_timestamp'
+);
+ create_parent
+---------------
+ t
+(1 row)
+```
+
+```sql
+\d+ partman_test.time_taptest_table
+                             Partitioned table "partman_test.time_taptest_table"
+ Column |  Type  | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
+--------+--------+-----------+----------+---------------+----------+-------------+--------------+-------------
+ col1   | bigint |           | not null |               | plain    |             |              |
+ col2   | text   |           |          | 'stuff'::text | extended |             |              |
+Partition key: RANGE (col1)
+Indexes:
+    "time_taptest_table_pkey" PRIMARY KEY, btree (col1)
+Partitions: partman_test.time_taptest_table_p20250107_030000 FOR VALUES FROM ('1326022498713600000') TO ('1326037598208000000'),
+            partman_test.time_taptest_table_p20250107_040000 FOR VALUES FROM ('1326037598208000000') TO ('1326052697702400000'),
+            partman_test.time_taptest_table_p20250107_050000 FOR VALUES FROM ('1326052697702400000') TO ('1326067797196800000'),
+            partman_test.time_taptest_table_p20250107_060000 FOR VALUES FROM ('1326067797196800000') TO ('1326082896691200000'),
+            partman_test.time_taptest_table_p20250107_070000 FOR VALUES FROM ('1326082896691200000') TO ('1326097996185600000'),
+            partman_test.time_taptest_table_p20250107_080000 FOR VALUES FROM ('1326097996185600000') TO ('1326113095680000000'),
+            partman_test.time_taptest_table_p20250107_090000 FOR VALUES FROM ('1326113095680000000') TO ('1326128195174400000'),
+            partman_test.time_taptest_table_p20250107_100000 FOR VALUES FROM ('1326128195174400000') TO ('1326143294668800000'),
+            partman_test.time_taptest_table_p20250107_110000 FOR VALUES FROM ('1326143294668800000') TO ('1326158394163200000'),
+            partman_test.time_taptest_table_default DEFAULT
+```
+```sql
+\d+ partman_test.time_taptest_table_p20250107_030000
+                           Table "partman_test.time_taptest_table_p20250107_030000"
+ Column |  Type  | Collation | Nullable |    Default    | Storage  | Compression | Stats target | Description
+--------+--------+-----------+----------+---------------+----------+-------------+--------------+-------------
+ col1   | bigint |           | not null |               | plain    |             |              |
+ col2   | text   |           |          | 'stuff'::text | extended |             |              |
+Partition of: partman_test.time_taptest_table FOR VALUES FROM ('1326022498713600000') TO ('1326037598208000000')
+Partition constraint: ((col1 IS NOT NULL) AND (col1 >= '1326022498713600000'::bigint) AND (col1 < '1326037598208000000'::bigint))
+Indexes:
+    "time_taptest_table_p20250107_030000_pkey" PRIMARY KEY, btree (col1)
+Access method: heap
+```
+
 ### Simple Serial ID: 1 Partition Per 10 ID Values
 For this use-case, the template table is not created manually before calling `create_parent()`. So it shows that if a primary/unique key is added later, it does not apply to the currently existing child tables. That will have to be done manually.
 

diff --git a/sql/functions/create_parent.sql b/sql/functions/create_parent.sql
@@ -182,7 +182,9 @@ IF v_control_type NOT IN ('time', 'id', 'text', 'uuid') THEN
     RAISE EXCEPTION 'Only date/time, text/uuid or integer types are allowed for the control column.';
 ELSIF v_control_type IN ('text', 'uuid') AND (p_time_encoder IS NULL OR p_time_decoder IS NULL) THEN
     RAISE EXCEPTION 'p_time_encoder and p_time_decoder needs to be set for text/uuid type control column.';
-ELSIF v_control_type NOT IN ('text', 'uuid') AND (p_time_encoder IS NOT NULL OR p_time_decoder IS NOT NULL) THEN
+ELSIF v_control_type = 'id' AND p_epoch = 'func' AND (p_time_encoder IS NULL OR p_time_decoder IS NULL) THEN
+    RAISE EXCEPTION 'p_time_encoder and p_time_decoder functions need to be set for p_epoch=func to work.';
+ELSIF v_control_type NOT IN ('text', 'uuid', 'id') AND (p_time_encoder IS NOT NULL OR p_time_decoder IS NOT NULL) THEN
     RAISE EXCEPTION 'p_time_encoder and p_time_decoder can only be used with text/uuid type control column.';
 END IF;
 

diff --git a/sql/functions/create_partition_time.sql b/sql/functions/create_partition_time.sql
@@ -14,6 +14,7 @@ ex_hint                         text;
 ex_message                      text;
 v_control                       text;
 v_control_type                  text;
+v_time_decoder                  text;
 v_time_encoder                  text;
 v_datetime_string               text;
 v_epoch                         text;
@@ -45,6 +46,8 @@ v_sub_timestamp_max             timestamptz;
 v_sub_timestamp_min             timestamptz;
 v_template_table                text;
 v_time                          timestamptz;
+v_partition_id_start            bigint;
+v_partition_id_end              bigint;
 v_partition_text_start          text;
 v_partition_text_end            text;
 
@@ -54,6 +57,7 @@ BEGIN
  */
 
 SELECT control
+    , time_decoder
     , time_encoder
     , partition_interval::interval -- this shared field also used in partition_id as bigint
     , epoch
@@ -62,6 +66,7 @@ SELECT control
     , template_table
     , inherit_privileges
 INTO v_control
+    , v_time_decoder
     , v_time_encoder
     , v_partition_interval
     , v_epoch
@@ -123,6 +128,7 @@ v_partition_expression := CASE
     WHEN v_epoch = 'milliseconds' THEN format('to_timestamp((%I/1000)::float)', v_control)
     WHEN v_epoch = 'microseconds' THEN format('to_timestamp((%I/1000000)::float)', v_control)
     WHEN v_epoch = 'nanoseconds' THEN format('to_timestamp((%I/1000000000)::float)', v_control)
+    WHEN v_epoch = 'func' THEN format('%s(%I)', v_time_decoder, v_control)
     ELSE format('%I', v_control)
 END;
 RAISE DEBUG 'create_partition_time: v_partition_expression: %', v_partition_expression;
@@ -237,7 +243,17 @@ FOREACH v_time IN ARRAY p_partition_times LOOP
                 , v_partition_text_start
                 , v_partition_text_end);
         END IF;
+    ELSIF v_epoch = 'func' THEN
+        EXECUTE format('SELECT %s(%L)', v_time_encoder, v_partition_timestamp_start) INTO v_partition_id_start;
+        EXECUTE format('SELECT %s(%L)', v_time_encoder, v_partition_timestamp_end) INTO v_partition_id_end;
 
+        EXECUTE format('ALTER TABLE %I.%I ATTACH PARTITION %I.%I FOR VALUES FROM (%L) TO (%L)'
+            , v_parent_schema
+            , v_parent_tablename
+            , v_parent_schema
+            , v_partition_name
+            , v_partition_id_start
+            , v_partition_id_end);
     ELSE
         -- Must attach with integer based values for built-in constraint and epoch
         IF v_epoch = 'seconds' THEN

diff --git a/sql/functions/partition_data_time.sql b/sql/functions/partition_data_time.sql
@@ -38,6 +38,7 @@ v_source_schemaname         text;
 v_source_tablename          text;
 v_rowcount                  bigint;
 v_start_control             timestamptz;
+v_time_decoder              text;
 v_total_rows                bigint := 0;
 
 BEGIN
@@ -49,10 +50,12 @@ SELECT partition_interval::interval
     , control
     , datetime_string
     , epoch
+    , time_decoder
 INTO v_partition_interval
     , v_control
     , v_datetime_string
     , v_epoch
+    , v_time_decoder
 FROM @extschema@.part_config
 WHERE parent_table = p_parent_table;
 IF NOT FOUND THEN
@@ -133,6 +136,7 @@ v_partition_expression := CASE
     WHEN v_epoch = 'milliseconds' THEN format('to_timestamp((%I/1000)::float)', v_control)
     WHEN v_epoch = 'microseconds' THEN format('to_timestamp((%I/1000000)::float)', v_control)
     WHEN v_epoch = 'nanoseconds' THEN format('to_timestamp((%I/1000000000)::float)', v_control)
+    WHEN v_epoch = 'func' THEN format('%s(%I)', v_time_decoder, v_control)
     ELSE format('%I', v_control)
 END;