19
19
package org .apache .flink .table .annotation ;
20
20
21
21
import org .apache .flink .annotation .PublicEvolving ;
22
+ import org .apache .flink .table .functions .ProcessTableFunction ;
22
23
import org .apache .flink .table .types .inference .StaticArgumentTrait ;
23
24
24
25
import java .util .Arrays ;
@@ -43,31 +44,38 @@ public enum ArgumentTrait {
43
44
44
45
/**
45
46
* An argument that accepts a table "as row" (i.e. with row semantics). This trait only applies
46
- * to {@code ProcessTableFunction} (PTF).
47
+ * to {@link ProcessTableFunction} (PTF).
47
48
*
48
- * <p>For scalability, input tables are distributed into virtual processors. Each virtual
49
- * processor executes a PTF instance and has access only to a share of the entire table. The
50
- * argument declaration decides about the size of the share and co-location of data.
49
+ * <p>For scalability, input tables are distributed across so-called "virtual processors". A
50
+ * virtual processor, as defined by the SQL standard, executes a PTF instance and has access
51
+ * only to a portion of the entire table. The argument declaration decides about the size of the
52
+ * portion and co-location of data. Conceptually, tables can be processed either "as row" (i.e.
53
+ * with row semantics) or "as set" (i.e. with set semantics).
51
54
*
52
55
* <p>A table with row semantics assumes that there is no correlation between rows and each row
53
- * can be processed independently. The framework is free in how to distribute rows among virtual
54
- * processors and each virtual processor has access only to the currently processed row.
56
+ * can be processed independently. The framework is free in how to distribute rows across
57
+ * virtual processors and each virtual processor has access only to the currently processed row.
55
58
*/
56
59
TABLE_AS_ROW (StaticArgumentTrait .TABLE_AS_ROW ),
57
60
58
61
/**
59
62
* An argument that accepts a table "as set" (i.e. with set semantics). This trait only applies
60
- * to {@code ProcessTableFunction} (PTF).
63
+ * to {@link ProcessTableFunction} (PTF).
61
64
*
62
- * <p>For scalability, input tables are distributed into virtual processors. Each virtual
63
- * processor executes a PTF instance and has access only to a share of the entire table. The
64
- * argument declaration decides about the size of the share and co-location of data.
65
+ * <p>For scalability, input tables are distributed across so-called "virtual processors". A
66
+ * virtual processor, as defined by the SQL standard, executes a PTF instance and has access
67
+ * only to a portion of the entire table. The argument declaration decides about the size of the
68
+ * portion and co-location of data. Conceptually, tables can be processed either "as row" (i.e.
69
+ * with row semantics) or "as set" (i.e. with set semantics).
65
70
*
66
71
* <p>A table with set semantics assumes that there is a correlation between rows. When calling
67
72
* the function, the PARTITION BY clause defines the columns for correlation. The framework
68
73
* ensures that all rows belonging to same set are co-located. A PTF instance is able to access
69
- * all rows belonging to the same set. In other words: The virtual processor is scoped under a
70
- * key context.
74
+ * all rows belonging to the same set. In other words: The virtual processor is scoped by a key
75
+ * context.
76
+ *
77
+ * <p>It is also possible not to provide a key ({@link #OPTIONAL_PARTITION_BY}), in which case
78
+ * only one virtual processor handles the entire table, thereby losing scalability benefits.
71
79
*/
72
80
TABLE_AS_SET (StaticArgumentTrait .TABLE_AS_SET ),
73
81
0 commit comments