Skip to content

Commit eb7eee7

Browse files
committedJan 30, 2025
Details on by modulation
1 parent 61d5b07 commit eb7eee7

File tree

2 files changed

+205
-10
lines changed

2 files changed

+205
-10
lines changed
 

‎book/Section-Beyond-Basic-Queries.adoc

+43-1
Original file line numberDiff line numberDiff line change
@@ -2683,6 +2683,49 @@ g.V().has('airport','country','IE').
26832683
[2,2,2,5,1,1,1]
26842684
----
26852685

2686+
In the "<<bymodulators>>" section we learned about the concept of an unproductive
2687+
'by'. As a reminder, an unproductive 'by' is one that does not produce a result for
2688+
the step that it is tied to. When 'store' or 'aggregate' encounter an unproductive
2689+
'by' it has a filtering effect for that current traverser. For example, if in the
2690+
prior example we'd made a mistake and mistyped '"runways"' as '"rnways"' we would
2691+
have no results:
2692+
2693+
[source,groovy]
2694+
----
2695+
g.V().has('airport','country','IE').
2696+
aggregate('ireland').by('rnways').cap('ireland')
2697+
2698+
[]
2699+
----
2700+
2701+
This behavior is meant as a convenience when using 'store' or 'aggregate' with
2702+
heterogenous or incomplete elements where a value may not exist, thereby sparing an
2703+
error. It may be tempting to exploit this mechanic as a direct means of filtering as
2704+
demonstrated in the following example:
2705+
2706+
[source,groovy]
2707+
----
2708+
g.V().has('airport','country','IE').
2709+
aggregate('ireland').by(values('runways').is(gt(3))).
2710+
cap('ireland')
2711+
2712+
[5]
2713+
----
2714+
2715+
While the previous example is syntactically correct, it is a bit of an indirection
2716+
from a readaiblity perspective. It would be more idiomatic to write an explicit
2717+
filter instead:
2718+
2719+
[source,groovy]
2720+
----
2721+
g.V().has('airport','country','IE').
2722+
has('runways', gt(3)).
2723+
aggregate('ireland').by('runways').
2724+
cap('ireland')
2725+
2726+
[5]
2727+
----
2728+
26862729
If we are ever unsure what type of object has been created a call to 'getClass' can
26872730
be used to find out.
26882731

@@ -5404,7 +5447,6 @@ the results of your queries as JSON. Remember that if you do save an entire grap
54045447
JSON, unless you specify otherwise, the default format is GraphSON 3.0 with
54055448
embedded types.
54065449

5407-
54085450
[[nulls]]
54095451
Using null in Gremlin
54105452
~~~~~~~~~~~~~~~~~~~~~

‎book/Section-Writing-Gremlin-Queries.adoc

+162-9
Original file line numberDiff line numberDiff line change
@@ -736,7 +736,7 @@ that look like the following line.
736736
[LCY,456,GVA]
737737
----
738738

739-
The 'by' modulator steps are processed in a round robin fashion. If there are not
739+
The 'by' modulator steps are processed in a round-robin fashion. If there are not
740740
enough modulators specified for the total number of elements in the path, Gremlin
741741
just loops back around to the first 'by' step and so on. So even though there were
742742
three elements in the path that we wanted to have formatted, we only needed to
@@ -748,8 +748,8 @@ explicit 'by' modulator steps. This would be required if, for example, we wanted
748748
reference the 'city' property of the third element in the path rather than its
749749
'code'.
750750

751-
TIP: The 'by' modulator steps are processed in a round robin fashion in cases where
752-
there are more results to apply them to than 'by' modulators specified.
751+
TIP: The 'by' modulator steps are processed in a round-robin fashion in cases where
752+
there are more results to apply them to than the number of 'by' modulators specified.
753753

754754
The example above is equivalent to this longer form of the same query.
755755

@@ -1141,10 +1141,9 @@ g.V().has('type','airport').limit(10).as('a','b','c').
11411141
by('code').by('region').by(out().count())
11421142
----
11431143

1144-
In the most recent releases of TinkerPop you can also use the new 'project' step and
1145-
achieve the same results that you can get from the combination of 'as' and 'select'
1146-
steps. The example below shows the previous query, rewritten to use 'project' instead
1147-
of 'as' and 'select'.
1144+
The 'project' step can achieve the same results as obtained from the combination of
1145+
'as' and 'select' steps. The example below shows the previous query, rewritten to use
1146+
'project' instead of 'as' and 'select'.
11481147

11491148
[source,groovy]
11501149
----
@@ -1197,6 +1196,117 @@ When we run the modified query, here is the output we get.
11971196
[IATA:IAD,Region:US-VA,Routes:136]
11981197
----
11991198

1199+
[[bymodulators]]
1200+
Traits of 'by' modulators
1201+
^^^^^^^^^^^^^^^^^^^^^^^^^
1202+
1203+
We've seen enough use of the 'by' modulator now to dive deeper on their general
1204+
behavior. As we learned earlier, a 'by' modulator is a form of step that influences
1205+
the behavior of the step that it is associated with. Moreover, The 'by' modulators
1206+
are processed in a round-robin fashion in cases where there are more results to apply
1207+
them to than the number of 'by' modulators specified.
1208+
1209+
In addition to those basic definitions, there are some other points worth exploring.
1210+
First, let's take a look at the list of steps that support 'by' modulation, as 'by'
1211+
cannot be used on all steps:
1212+
1213+
[cols="1,1,1,1",options="header"]
1214+
|==============================================================================
1215+
|'aggregate' |'cyclicPath' |'dedup' |'group'
1216+
|'groupCount' |'math' |'order' |'path'
1217+
|'project' |'propertyMap' |'sack' |'sample'
1218+
|'select' |'simplePath' |'store' |'tree'
1219+
|'valueMap' |'where' | |
1220+
|==============================================================================
1221+
1222+
Next, let's revisit some usage with 'by' where it is commonly used with 'path' and
1223+
'project'.
1224+
1225+
[source,groovy]
1226+
----
1227+
g.V().has('airport','code','AUS').
1228+
out().out().
1229+
path().by('code').
1230+
limit(10)
1231+
1232+
[AUS,EWR,YYZ]
1233+
[AUS,EWR,YVR]
1234+
[AUS,EWR,LHR]
1235+
[AUS,EWR,CDG]
1236+
[AUS,EWR,FRA]
1237+
[AUS,EWR,NRT]
1238+
[AUS,EWR,DEL]
1239+
[AUS,EWR,DUB]
1240+
[AUS,EWR,HKG]
1241+
[AUS,EWR,PEK]
1242+
1243+
g.V().has('type','airport').limit(10).
1244+
project('a','b','c').
1245+
by('code').
1246+
by('region').
1247+
by(outE().count())
1248+
1249+
[a:ATL,b:US-GA,c:232]
1250+
[a:ANC,b:US-AK,c:39]
1251+
[a:AUS,b:US-TX,c:59]
1252+
[a:BNA,b:US-TN,c:55]
1253+
[a:BOS,b:US-MA,c:129]
1254+
[a:BWI,b:US-MD,c:89]
1255+
[a:DCA,b:US-DC,c:93]
1256+
[a:DFW,b:US-TX,c:221]
1257+
[a:FLL,b:US-FL,c:141]
1258+
[a:IAD,b:US-VA,c:136]
1259+
----
1260+
1261+
In both of the prior examples, the 'by' modulators were 'productive' which means that
1262+
when the 'by' is used by the modulated step, it generates a result. When the 'by'
1263+
does not emit a result it is said to be 'unproductive' and introduces an important
1264+
aspect of Gremlin semantics. Let's modify the first example with 'path' to make it
1265+
unproductive by introducing a mistype to the property key and referring to it as
1266+
'"cde"' instead of '"code"':
1267+
1268+
[source,groovy]
1269+
----
1270+
g.V().has('airport','code','AUS').
1271+
out().out().
1272+
path().by('cde').
1273+
limit(10)
1274+
1275+
1276+
----
1277+
1278+
The prior example returns no results. The unproductive 'by' to 'path' forces it to
1279+
behave a bit like a filter. Since 'path' can't use '"cde"' to access a valid property
1280+
key, it simply drops that traverser to prevent an error. Now, let's modify the second
1281+
example to make it unproductive for some cases, particularly those where the number
1282+
of outgoing edges is less than 100:
1283+
1284+
[source,groovy]
1285+
----
1286+
g.V().has('type','airport').limit(10).
1287+
project('a','b','c').
1288+
by('code').
1289+
by('region').
1290+
by(outE().count().is(gt(100)))
1291+
1292+
[a:ATL,b:US-GA,c:232]
1293+
[a:ANC,b:US-AK]
1294+
[a:AUS,b:US-TX]
1295+
[a:BNA,b:US-TN]
1296+
[a:BOS,b:US-MA,c:129]
1297+
[a:BWI,b:US-MD]
1298+
[a:DCA,b:US-DC]
1299+
[a:DFW,b:US-TX,c:221]
1300+
[a:FLL,b:US-FL,c:141]
1301+
[a:IAD,b:US-VA,c:136]
1302+
----
1303+
1304+
As you can see the 'project' step will not emit a key for '"c"' when a modulator is
1305+
unproductive. Each step will have its own semantics for what it does with an
1306+
unproductive 'by', but typically there is a form of filtering operation that occurs
1307+
akin to what we've seen in the prior example. We will see more examples of 'by'
1308+
introducing these kinds of behaviors as we learn new steps in future sections.
1309+
12001310
[[multias]]
12011311
Using multiple 'as' steps with the same label
12021312
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -4384,6 +4494,51 @@ g.V().hasLabel('airport').limit(20).values('code').order().by(shuffle).fold()
43844494
[MCO,LGA,BWI,IAD,ATL,BOS,DCA,BNA,IAH,DFW,MIA,MSP,ANC,AUS,JFK,ORD,PBI,FLL,LAX,PHX]
43854495
----
43864496

4497+
It is important to call attention to the use of 'by' modulators with 'order' as they
4498+
introduce some special semantics when they are unproductive, a topic we learned about
4499+
in the "<<bymodulators>>" section. Let's modify an earlier example to make the 'by'
4500+
unproductive by introducing a spelling mistake to the property key we want to order
4501+
on:
4502+
4503+
[source,groovy]
4504+
----
4505+
g.V().has('code','AUS').out().
4506+
order().by('cde').
4507+
values('code','icao').fold()
4508+
4509+
[]
4510+
----
4511+
4512+
As you can see, an unproductive 'by' makes 'order' behave like a filter for those
4513+
traversers that can't make use of '"cde"' (which in this case is all of them). This
4514+
filtering semantic is often quite helpful when working with heterogenous elements by
4515+
providing some flexibilty in the face of what could otherwise just been an exception.
4516+
These semantics could lead to a temptation to use them to explicitly filter in the
4517+
'by' as follows:
4518+
4519+
[source,groovy]
4520+
----
4521+
g.V().has('code','AUS').out().
4522+
order().by(__.as('v').values('runways').is(gt(4)).select('v').values('code')).
4523+
values('code','icao').fold()
4524+
4525+
[ATL,KATL,BOS,KBOS,DEN,KDEN,DFW,KDFW,DTW,KDTW,IAH,KIAH,MDW,KMDW,ORD,KORD,YYZ,CYYZ]
4526+
----
4527+
4528+
While the prior example works, it is not an advisable approach because it reduces the
4529+
readability of the query. It would far more idiomatic to write the above query with
4530+
an explicit 'has' fitler as shown next:
4531+
4532+
[source,groovy]
4533+
----
4534+
g.V().has('code','AUS').
4535+
out().has('runways',gt(4)).
4536+
order().by('code').
4537+
values('code','icao').fold()
4538+
4539+
[ATL,KATL,BOS,KBOS,DEN,KDEN,DFW,KDFW,DTW,KDTW,IAH,KIAH,MDW,KMDW,ORD,KORD,YYZ,CYYZ]
4540+
----
4541+
43874542
Below is an example where we combine the field we want to sort by 'longest'
43884543
and the direction we want the sort to take, 'desc' into a single 'by' instruction.
43894544

@@ -6570,8 +6725,6 @@ g.V(3).bothE().otherV().dedup().order().by(id()).fold()
65706725
As you can see, when we use 'otherV' we do not get 'v[3]' returned as we are only
65716726
looking at the other vertices relative to where we started from, which was 'v[3]'.
65726727

6573-
6574-
65756728
[[sp]]
65766729
Shortest paths (between airports) - introducing 'repeat'
65776730
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)