Skip to content

YQ supported override nodes placement #19787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

GrigoriyPA
Copy link
Collaborator

Changelog entry

Supported override nodes placement in pragma OverridePlanner

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

@GrigoriyPA GrigoriyPA requested a review from a team as a code owner June 18, 2025 08:15
@GrigoriyPA GrigoriyPA requested a review from Copilot June 18, 2025 08:15
@GrigoriyPA GrigoriyPA requested review from gridnevvvit and Hor911 June 18, 2025 08:16
Copilot

This comment was marked as outdated.

Copy link

github-actions bot commented Jun 18, 2025

2025-06-18 08:17:03 UTC Pre-commit check linux-x86_64-relwithdebinfo for 5a0a465 has started.
2025-06-18 08:17:07 UTC Artifacts will be uploaded here
2025-06-18 08:18:09 UTC Check cancelled

Copy link

github-actions bot commented Jun 18, 2025

2025-06-18 08:17:04 UTC Pre-commit check linux-x86_64-release-asan for 5a0a465 has started.
2025-06-18 08:17:08 UTC Artifacts will be uploaded here
2025-06-18 08:18:09 UTC Check cancelled

Copy link

🟢 2025-06-18 08:17:10 UTC The validation of the Pull Request description is successful.

Copy link

github-actions bot commented Jun 18, 2025

2025-06-18 08:21:10 UTC Pre-commit check linux-x86_64-relwithdebinfo for f5291c5 has started.
2025-06-18 08:21:22 UTC Artifacts will be uploaded here
2025-06-18 08:25:04 UTC Check cancelled

Copy link

github-actions bot commented Jun 18, 2025

2025-06-18 08:21:13 UTC Pre-commit check linux-x86_64-release-asan for f5291c5 has started.
2025-06-18 08:21:24 UTC Artifacts will be uploaded here
2025-06-18 08:25:04 UTC Check cancelled

@GrigoriyPA GrigoriyPA requested a review from Copilot June 18, 2025 08:24
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for specifying explicit node placement for query execution stages via the OverridePlanner pragma.

  • Introduces a new TasksNodes field in the physical stage protobuf to carry node hints.
  • Implements JSON parsing and validation in ApplyOverridePlannerSettings and integrates it into query compilation.
  • Updates the executer to consume TasksNodes, routing tasks to specified nodes or falling back to the existing planner.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
ydb/core/protos/kqp_physical.proto Added TasksNodes repeated field to TKqpPhyStage.
ydb/core/kqp/query_compiler/kqp_query_compiler.cpp Created ApplyOverridePlannerSettings and wired it into TKqpQueryCompiler.
ydb/core/kqp/executer_actor/kqp_scan_executer.cpp Adjusted BuildComputeTasks call to include an explicit (empty) node-hint vector.
ydb/core/kqp/executer_actor/kqp_executer_impl.h Added GetStageNodesHint helper and updated task-building logic to use hints.
ydb/core/kqp/executer_actor/kqp_data_executer.cpp Flagged ResourceSnapshotRequired for overridden stages and passed hints to compute tasks.
Comments suppressed due to low confidence (2)

ydb/core/kqp/executer_actor/kqp_executer_impl.h:1026

  • [nitpick] The word "Planer" in GetStageNodesHint and related variables is misspelled; consider renaming to GetStageNodesPlanner or GetStageNodesPlan and renaming nodesPlaner to nodesPlanner for clarity.
    std::function<ui32(ui32 taskIdx)> GetStageNodesHint(const NKqpProto::TKqpPhyStage& stage, const TVector<NKikimrKqp::TKqpNodeResources>& resourceSnapshot) {

ydb/core/kqp/query_compiler/kqp_query_compiler.cpp:496

  • Consider adding unit tests for ApplyOverridePlannerSettings, covering valid and invalid JSON inputs (including node lists) to ensure overrides are parsed and errors are reported correctly.
TIssues ApplyOverridePlannerSettings(const TString& overridePlannerJson, NKqpProto::TKqpPhyQuery& queryProto) {

Copy link

github-actions bot commented Jun 18, 2025

2025-06-18 08:26:23 UTC Pre-commit check linux-x86_64-release-asan for b7038bd has started.
2025-06-18 08:26:27 UTC Artifacts will be uploaded here
2025-06-18 08:30:04 UTC ya make is running...
2025-06-18 08:31:20 UTC Check cancelled

Copy link

github-actions bot commented Jun 18, 2025

2025-06-18 08:28:39 UTC Pre-commit check linux-x86_64-relwithdebinfo for b7038bd has started.
2025-06-18 08:28:50 UTC Artifacts will be uploaded here
2025-06-18 08:31:18 UTC Check cancelled

Copy link

github-actions bot commented Jun 18, 2025

2025-06-18 08:32:36 UTC Pre-commit check linux-x86_64-release-asan for f736a71 has started.
2025-06-18 08:32:41 UTC Artifacts will be uploaded here
2025-06-18 08:36:16 UTC ya make is running...
🟡 2025-06-18 10:55:29 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
15873 15608 0 116 123 26

2025-06-18 10:56:51 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-06-18 11:35:11 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
2039 (only retried tests) 1879 0 62 78 20

2025-06-18 11:35:30 UTC ya make is running... (failed tests rerun, try 3)
🟡 2025-06-18 12:10:06 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1459 (only retried tests) 1265 0 57 120 17

🟢 2025-06-18 12:10:22 UTC Build successful.
🟡 2025-06-18 12:10:56 UTC ydbd size 3.9 GiB changed* by +178.9 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: 3335cfb merge: f736a71 diff diff %
ydbd size 4 173 702 408 Bytes 4 173 885 640 Bytes +178.9 KiB +0.004%
ydbd stripped size 1 447 299 032 Bytes 1 447 354 456 Bytes +54.1 KiB +0.004%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

Copy link

github-actions bot commented Jun 18, 2025

2025-06-18 08:33:22 UTC Pre-commit check linux-x86_64-relwithdebinfo for f736a71 has started.
2025-06-18 08:33:26 UTC Artifacts will be uploaded here
2025-06-18 08:37:01 UTC ya make is running...
🟡 2025-06-18 10:30:45 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
38135 35438 0 4 2662 31

2025-06-18 10:33:58 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-06-18 10:45:53 UTC Some tests failed, follow the links below. Going to retry failed tests...

Test history | Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
434 (only retried tests) 407 0 1 0 26

2025-06-18 10:46:06 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-06-18 10:57:16 UTC Some tests failed, follow the links below.

Test history | Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
287 (only retried tests) 263 0 1 0 23

🟢 2025-06-18 10:57:24 UTC Build successful.
🟡 2025-06-18 10:57:45 UTC ydbd size 2.2 GiB changed* by +108.4 KiB, which is >= 100.0 KiB vs main: Warning

ydbd size dash main: 3335cfb merge: f736a71 diff diff %
ydbd size 2 373 000 616 Bytes 2 373 111 624 Bytes +108.4 KiB +0.005%
ydbd stripped size 497 531 720 Bytes 497 550 408 Bytes +18.2 KiB +0.004%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

for (ui32 i = 0; i < partitionsCount; ++i) {
auto& task = TasksGraph.AddTask(stageInfo);
task.Meta.Type = TTaskMeta::TTaskType::Compute;
if (!nodesPlanner) {
task.Meta.Type = TTaskMeta::TTaskType::Compute;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

что тут происходит, зачем это?

@@ -1023,6 +1023,27 @@ class TKqpExecuterBase : public TActor<TDerived> {
}
}

std::function<ui32(ui32 taskIdx)> GetStageNodesHint(const NKqpProto::TKqpPhyStage& stage, const TVector<NKikimrKqp::TKqpNodeResources>& resourceSnapshot) {
Copy link
Member

@gridnevvvit gridnevvvit Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

зачем аж std::function ? можно проще?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

я предлагаю подумать и отказаться от этой идеи как таковой, а решать задачу в какой то другой парадигме. например если задача резделения нагрузки то запрещать использовать какие-то ноды.

назначать сотни тасок по нодам не имеет примерно никакого юзкейса

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants