-
Notifications
You must be signed in to change notification settings - Fork 640
YQ-3984 Group by hop docs to ydb #15017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YQ-3984 Group by hop docs to ydb #15017
Conversation
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
❌ Documentation buildRevision build failed Build logsErrors (2)❌ /ru/yql/reference/syntax/group_by.md: 283: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```yql"] ❌ /ru/yql/reference/syntax/group_by.md: 285: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] |
⚪ Test history | Ya make output | Test bloat
🟢 |
✅ Documentation buildRevision built successfully |
⚪ Test history | Ya make output | Test bloat
🟢 |
✅ Documentation buildRevision built successfully |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
✅ Documentation buildRevision built successfully |
Group the table by the values of the specified columns or expressions and the time window. | ||
|
||
If GROUP BY is present in the query, then when selecting columns (between `SELECT ... FROM`) you can **only** use the following constructs: | ||
|
||
1. Columns by which grouping is performed (they are included in the `GROUP BY` argument). | ||
2. Aggregate functions (see the next section). Columns by which **no** grouping is made can only be included as arguments for an aggregate function. | ||
3. Functions that output the start and end time of the current window (`HOP_START` and `HOP_END`) | ||
4. Arbitrary calculations combining paragraphs 1-3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood what's going on here correctly, now the group by without hops is hidden for RTMR, while this section for hops is visible in all cases? If so, this intro needs to be in the beginning before the regular group by, not here somewhere in the middle of the article.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood what's going on here correctly, now the group by without hops is hidden for RTMR, while this section for hops is visible in all cases?
да
If so, this intro needs to be in the beginning before the regular group by, not here somewhere in the middle of the article.
я не уверен что это вступление вообще актуально/правильно, поэтому оставил его только для RTMR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
я не уверен что это вступление вообще актуально/правильно, поэтому оставил его только для RTMR
надо разобраться и поправить, вступление — чуть ли не самое главное в статье
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Перенес вступление в общую часть
|
||
The implemented version of the time window is called the **hopping window**. This is a window that moves forward in discrete intervals (the `hop` parameter). The total duration of the window is set by the `interval` parameter. To determine the time of each input event, the `time_extractor` parameter is used. This expression depends only on the input values of the stream's columns and must have the `Timestamp` type. It indicates where exactly to get the time value from input events. | ||
|
||
In each stream defined by the values of all the grouping columns, the window moves forward independently of other streams. Advancement of the window is totally dependent on the latest event of the stream. Since records in streams get somewhat mixed in time, the `delay` parameter has been added so you can delay the closing of the window by a specified period. Events arriving before the current window are ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This talks about streams and it's not clear how it maps to running this over regular tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Поправил, убрал в большинстве streams
Co-authored-by: Ivan Blinkov <ivan@blinkov.ru>
❌ Documentation buildRevision build failed Build logsErrors (8)❌ /en/devops/ansible/initial-deployment.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/postgresql/statements/delete_from.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/postgresql/statements/drop_table.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/postgresql/statements/insert_into.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/yql/reference/syntax/expressions.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/yql/reference/types/primitive.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /ru/yql/reference/syntax/expressions.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /ru/yql/reference/types/primitive.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines Warnings (675)Log was truncated. (660 records) |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
Co-authored-by: Ivan Blinkov <ivan@blinkov.ru>
❌ Documentation buildRevision build failed Build logsErrors (8)❌ /en/devops/ansible/initial-deployment.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/postgresql/statements/delete_from.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/postgresql/statements/drop_table.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/postgresql/statements/insert_into.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/yql/reference/syntax/expressions.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/yql/reference/types/primitive.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /ru/yql/reference/syntax/expressions.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /ru/yql/reference/types/primitive.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines Warnings (675)Log was truncated. (660 records) |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
Co-authored-by: Ivan Blinkov <ivan@blinkov.ru>
❌ Documentation buildRevision build failed Build logsErrors (8)❌ /en/devops/ansible/initial-deployment.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/postgresql/statements/delete_from.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/postgresql/statements/drop_table.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/postgresql/statements/insert_into.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/yql/reference/syntax/expressions.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /en/yql/reference/types/primitive.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /ru/yql/reference/syntax/expressions.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines ❌ /ru/yql/reference/types/primitive.md: 2: MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines Warnings (675)Log was truncated. (660 records) |
Поскольку записи в потоках слегка перемешиваются во времени, добавлен параметр `delay`, позволяющий отложить закрытие окна на указанную величину. События, приходящие до текущего окна, игнорируются. | ||
{% endif %} | ||
|
||
Параметры `interval` и `delay` следует задавать кратными параметру `hop`. Некратные интервалы будут округлены в меньшую сторону. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interval
- положительное число, делящееся на hop
, delay
- неотрицательное число, делящееся на hop
. Некратные интервалы в текущей реализации запрещены.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Добавил
✅ Documentation buildRevision built successfully |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
🔄 New commits pushed — @anton-bobkov please take a look. |
✅ Documentation buildRevision built successfully |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
HOP(time_extractor, hop, interval, delay) | ||
``` | ||
|
||
Реализованный вариант окна времени называется **hopping window**. Это окно, продвигающееся вперёд дискретными интервалами (параметр `hop`). Общая длительность окна задаётся параметром `interval`. Для определения времени каждого входного события используется параметр `time_extractor`. Это выражение, зависящее только от входных значений столбцов, должно иметь тип `Timestamp`. Оно указывает, откуда именно во входных событиях доставать значение времени. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не совсем понятно о каких входных событиях идёт речь, если мы говорим об обычных таблицах в YDB.
|
||
Для задания `hop`, `interval` и `delay` используется строковое выражение, соответствующее стандарту [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601). Это формат, который используется для конструирования встроенного типа `Interval` [из строки](../builtins/basic.md#data-type-literals). | ||
|
||
Функции без параметров `HOP_START` и `HOP_END` возвращают значение типа `Timestamp` и соответствуют началу и концу текущего окна. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Тут чуть больше информации добавить:
При выборке столбцов (между SELECT ... FROM) можно использовать функции HOP_START
и HOP_END
(без параметров), которые возвращают значение типа Timestamp
и соответствуют началу и концу текущего окна.
Параметры `interval` и `hop` следует задавать положительными. | ||
|
||
{% if select_command != "SELECT STREAM" %} | ||
Параметр `delay` в текущей реализации не используется т.к. данные в одной партиции уже отсортированы. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Лучше уточнить как именно не используется -- игнорируется или не указывается? В примерах delay присутствует...
|
||
Group the `SELECT` results by the values of the specified columns or expressions. `GROUP BY` is often combined with [aggregate functions](../builtins/aggregation.md) (`COUNT`, `MAX`, `MIN`, `SUM`, `AVG`) to perform calculations in each group. | ||
You can group by the result of an arbitrary expression computed from the source columns. In this case, to access the result of this expression, we recommend assigning a name to it using `AS`. See the second example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Добавить ссылку на пример
|
||
## GROUP BY ... HOP | ||
|
||
Сгруппировать таблицу по значениям указанных столбцов или выражений, а также окну времени. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можем добавить информацию о том, что такое окно времени или ссылку на эту информацию?
Co-authored-by: anton-bobkov <anton-bobkov@ydb.tech>
🔄 New commits pushed — @anton-bobkov please take a look. |
✅ Documentation buildRevision built successfully |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
🔄 New commits pushed — @anton-bobkov please take a look. |
⚪ Test history | Ya make output | Test bloat
🟢 |
⚪ Test history | Ya make output | Test bloat
🟢 |
✅ Documentation buildRevision built successfully |
Heads-up: it's been 10 business-days since a reviewer comment. @kardymonds, any updates? @anton-bobkov, please check the status with the author. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Я ок, но текст не очень легко читается.
Плюс я очень удивлен, что конструкция GROUP BY в ToC представлена как подстатья "Cинтаксис", а не SELECT. Это специально так?
Связанный ПР #18283 |
Changelog entry
...
Changelog category
Description for reviewers
...