Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/230 get partitioning checkpoints #265

Merged
merged 28 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
6b2df24
tmp commit
salamonpavel Sep 5, 2024
4664180
tmp commit
salamonpavel Sep 5, 2024
099ce5a
sql change
salamonpavel Sep 6, 2024
2bd1548
tests
salamonpavel Sep 6, 2024
e4cf419
tests
salamonpavel Sep 6, 2024
3e69c7f
Merge branch 'refs/heads/master' into feature/230-getPartitioningChec…
salamonpavel Sep 6, 2024
d59d7da
conflicts
salamonpavel Sep 6, 2024
eb01623
fex
salamonpavel Sep 6, 2024
1b28b7c
fex
salamonpavel Sep 6, 2024
4a5e925
Merge branch 'master' into feature/230-getPartitioningCheckpoints
salamonpavel Sep 10, 2024
67cab0b
Update database/src/main/postgres/runs/V1.8.3__get_partitioning_check…
salamonpavel Sep 12, 2024
dfc8ad9
comments addressed
salamonpavel Sep 12, 2024
1823dfc
comments addressed
salamonpavel Sep 16, 2024
d3d98c8
comments addressed
salamonpavel Sep 16, 2024
9716301
comments addressed
salamonpavel Sep 16, 2024
0f15e1c
comments addressed
salamonpavel Sep 16, 2024
2bf339e
comments addressed
salamonpavel Sep 16, 2024
dd3b7e2
fix
salamonpavel Sep 16, 2024
71c1c33
Merge branch 'master' into feature/230-getPartitioningCheckpoints
salamonpavel Sep 17, 2024
611ea75
conflicts with master resolved
salamonpavel Sep 17, 2024
7811f37
comments addressed
salamonpavel Sep 18, 2024
c33a136
tests fixed
salamonpavel Sep 18, 2024
4e0f6ba
tests deterministic
salamonpavel Sep 18, 2024
e187745
jacoco report action 1.7.1
salamonpavel Sep 18, 2024
92a2099
test
salamonpavel Sep 18, 2024
1e14e10
remove serde in paginated result
salamonpavel Sep 18, 2024
4b2a9db
Merge branch 'master' into feature/230-getPartitioningCheckpoints
salamonpavel Sep 19, 2024
8811b4e
no status when no results
salamonpavel Sep 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@
* limitations under the License.
*/

-- Function: runs.get_partitioning_checkpoints(JSONB, INT, TEXT)
CREATE OR REPLACE FUNCTION runs.get_partitioning_checkpoints(
IN i_partitioning JSONB,
IN i_limit INT DEFAULT 5,
IN i_partitioning_id BIGINT,
IN i_checkpoints_limit INT DEFAULT 5,
IN i_offset BIGINT DEFAULT 0,
IN i_checkpoint_name TEXT DEFAULT NULL,
OUT status INTEGER,
OUT status_text TEXT,
Expand All @@ -28,86 +28,113 @@ CREATE OR REPLACE FUNCTION runs.get_partitioning_checkpoints(
OUT measured_columns TEXT[],
OUT measurement_value JSONB,
OUT checkpoint_start_time TIMESTAMP WITH TIME ZONE,
OUT checkpoint_end_time TIMESTAMP WITH TIME ZONE
OUT checkpoint_end_time TIMESTAMP WITH TIME ZONE,
OUT has_more BOOLEAN
)
RETURNS SETOF record AS
$$
-------------------------------------------------------------------------------
RETURNS SETOF record AS
-------------------------------------------------------------------------------
--
-- Function: runs.get_partitioning_checkpoints(JSONB, INT, TEXT)
-- Retrieves all checkpoints (measures and their measurement details) related to a
-- Function: runs.get_partitioning_checkpoints(4)
-- Retrieves checkpoints (measures and their measurement details) related to a
-- given partitioning (and checkpoint name, if specified).
--
-- Parameters:
-- i_partitioning - partitioning of requested checkpoints
-- i_limit - (optional) maximum number of checkpoint's measurements to return
-- if 0 specified, all data will be returned, i.e. no limit will be applied
-- i_checkpoints_limit - (optional) maximum number of checkpoints to return
-- i_offset - (optional) offset of the first checkpoint to return
-- i_checkpoint_name - (optional) name of the checkpoint

-- Note: i_checkpoints_limit and i_offset are used for pagination purposes;
-- checkpoints are ordered by process_start_time in descending order
-- and then by id_checkpoint in ascending order
--
-- Returns:
-- i_checkpoint_name - (optional) if specified, returns data related to particular checkpoint's name
-- status - Status code
-- status_text - Status message
-- id_checkpoint - ID of the checkpoint
-- checkpoint_name - Name of the checkpoint
-- author - Author of the checkpoint
-- measuredByAtumAgent - Flag indicating whether the checkpoint was measured by ATUM agent
-- measured_by_atum_agent - Flag indicating whether the checkpoint was measured by ATUM agent
-- measure_name - Name of the measure
-- measure_columns - Columns of the measure
-- measurement_value - Value of the measurement
-- checkpoint_start_time - Time of the checkpoint
-- checkpoint_end_time - End time of the checkpoint computation
-- has_more - Flag indicating whether there are more checkpoints available
--
-- Status codes:
-- 11 - OK
-- 41 - Partitioning not found
-- 42 - No checkpoint data found
--
-------------------------------------------------------------------------------
$$
DECLARE
_fk_partitioning BIGINT;
_has_more BOOLEAN;
BEGIN
_fk_partitioning = runs._get_id_partitioning(i_partitioning);

IF _fk_partitioning IS NULL THEN
PERFORM 1 FROM runs.partitionings WHERE id_partitioning = i_partitioning_id;
IF NOT FOUND THEN
status := 41;
status_text := 'Partitioning not found';
RETURN NEXT;
RETURN;
END IF;

IF i_checkpoints_limit IS NOT NULL THEN
SELECT count(*) > i_checkpoints_limit
Copy link
Contributor

@benedeki benedeki Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate solution, no need to change the current one, just for inspiration:

        PERFORM 1
        FROM runs.checkpoints C
        WHERE C.fk_partitioning = i_partitioning_id
          AND (i_checkpoint_name IS NULL OR C.checkpoint_name = i_checkpoint_name)
        LIMIT 1 
        OFFSET coalesce(i_offset) + i_checkpoints_limit;

        _has_more := found;

FROM runs.checkpoints C
WHERE C.fk_partitioning = i_partitioning_id
AND (i_checkpoint_name IS NULL OR C.checkpoint_name = i_checkpoint_name)
LIMIT i_checkpoints_limit + 1 OFFSET i_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complicated
First I thought this was a mistake to have ORDER BY omitted from the query.
Then I thought: wait we don't care about the order of we just want to see if there are more records.
Then I thought: but having the ORDER BY will cause to have the same data pages cached in memory for the query bellow.
And then I thought: but the ordering might be more expensive then the gain the cache gives.
So I don't know 🤣

Suggested change
LIMIT i_checkpoints_limit + 1 OFFSET i_offset
ORDER BY C.process_start_time DESC, C.id_checkpoint
LIMIT i_checkpoints_limit + 1 OFFSET i_offset

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep the code as is then for now :)

INTO _has_more;
ELSE
_has_more := false;
END IF;

RETURN QUERY
WITH limited_checkpoints AS (
SELECT C.id_checkpoint,
C.checkpoint_name,
C.created_by,
C.measured_by_atum_agent,
C.process_start_time,
C.process_end_time
FROM runs.checkpoints C
WHERE C.fk_partitioning = i_partitioning_id
AND (i_checkpoint_name IS NULL OR C.checkpoint_name = i_checkpoint_name)
ORDER BY C.process_start_time DESC, C.id_checkpoint
LIMIT i_checkpoints_limit OFFSET i_offset
)
SELECT
11 AS status,
'Ok' AS status_text,
C.id_checkpoint,
C.checkpoint_name,
C.created_by AS author,
C.measured_by_atum_agent,
LC.id_checkpoint,
LC.checkpoint_name,
LC.created_by AS author,
LC.measured_by_atum_agent,
md.measure_name,
md.measured_columns,
M.measurement_value,
C.process_start_time AS checkpoint_start_time,
C.process_end_time AS checkpoint_end_time
LC.process_start_time AS checkpoint_start_time,
LC.process_end_time AS checkpoint_end_time,
_has_more AS has_more
FROM
runs.checkpoints C
JOIN
runs.measurements M ON C.id_checkpoint = M.fk_checkpoint
JOIN
limited_checkpoints LC
JOIN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal quirk, no need to change here:
I do prefer INNER JOIN to JOIN to make it obvious. So if in future you would consider using that 👼

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will add the 'INNER' and do so going forward always

runs.measurements M ON LC.id_checkpoint = M.fk_checkpoint
JOIN
runs.measure_definitions MD ON M.fk_measure_definition = MD.id_measure_definition
WHERE
C.fk_partitioning = _fk_partitioning
AND
(i_checkpoint_name IS NULL OR C.checkpoint_name = i_checkpoint_name)
ORDER BY
C.process_start_time,
C.id_checkpoint
LIMIT nullif(i_limit, 0);
LC.process_start_time, LC.id_checkpoint;
Copy link
Contributor

@benedeki benedeki Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major:
We want the measurements of one checkpoint go after each other followed by the next checkpoint etc.
While the current way how the measurements come in (all at once) ensure this gives the correct order, this is not guaranteed by the data itself, just by actual processes right now.

Suggested change
LC.process_start_time, LC.id_checkpoint;
LC.id_checkpoint, LC.process_start_time;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok switched the order


IF NOT FOUND THEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem?
I think it's valid situation, if no checkpoint for an existing partitioning exist so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, so far in queries returning detail records of a master one, of there were no details one, we ust returned the empty set. Would prefer we stay consistent, and if for some reason this is considered better, changer there too.
Tag @TebaleloS @lsulak

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. How about status 12 with text 'OK with no checkpoints found'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the OK solution might cause even more problems, as the data returned are NULL. So fa-db would need to be able to handle that and recognize that the particular record is OK but return empty sequence.
In the other cases we just returned no records at all.

status := 42;
status_text := 'No checkpoint data found';
RETURN NEXT;
END IF;
END;
$$

LANGUAGE plpgsql VOLATILE SECURITY DEFINER;

ALTER FUNCTION runs.get_partitioning_checkpoints(JSONB, INT, TEXT) OWNER TO atum_owner;

GRANT EXECUTE ON FUNCTION runs.get_partitioning_checkpoints(JSONB, INT, TEXT) TO atum_owner;

ALTER FUNCTION runs.get_partitioning_checkpoints(BIGINT, INT, BIGINT, TEXT) OWNER TO atum_owner;
GRANT EXECUTE ON FUNCTION runs.get_partitioning_checkpoints(BIGINT, INT, BIGINT, TEXT) TO atum_owner;
Loading
Loading